Data Analysis for Fibrosis paper - Loading and initial processing of stroma

Author

James Reineke and Matt Cannon

Published

September 17, 2024

Sample summary

  • Our data
    • stuff here

Load packages

Code
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()    masks stats::filter()
✖ dplyr::lag()       masks stats::lag()
✖ lubridate::stamp() masks cowplot::stamp()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code

Attaching package: 'patchwork'

The following object is masked from 'package:cowplot':

    align_plots
Code
Loading required package: SeuratObject
Loading required package: sp

Attaching package: 'SeuratObject'

The following objects are masked from 'package:base':

    intersect, t
Code
library(sctransform)
library(rrrSingleCellUtils)
Loading required package: nichenetr
Code
Loading required package: Hmisc

Attaching package: 'Hmisc'

The following object is masked from 'package:Seurat':

    Key

The following object is masked from 'package:SeuratObject':

    Key

The following objects are masked from 'package:dplyr':

    src, summarize

The following objects are masked from 'package:base':

    format.pval, units

Loading required package: data.table

Attaching package: 'data.table'

The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose
Code
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: 'matrixStats'

The following object is masked from 'package:dplyr':

    count


Attaching package: 'MatrixGenerics'

The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following object is masked from 'package:SeuratObject':

    intersect

The following objects are masked from 'package:lubridate':

    intersect, setdiff, union

The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:data.table':

    first, second

The following objects are masked from 'package:lubridate':

    second, second<-

The following objects are masked from 'package:dplyr':

    first, rename

The following object is masked from 'package:tidyr':

    expand

The following object is masked from 'package:utils':

    findMatches

The following objects are masked from 'package:base':

    expand.grid, I, unname

Loading required package: IRanges

Attaching package: 'IRanges'

The following object is masked from 'package:data.table':

    shift

The following object is masked from 'package:sp':

    %over%

The following object is masked from 'package:lubridate':

    %within%

The following objects are masked from 'package:dplyr':

    collapse, desc, slice

The following object is masked from 'package:purrr':

    reduce

Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:MatrixGenerics':

    rowMedians

The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians

The following object is masked from 'package:Hmisc':

    contents


Attaching package: 'SummarizedExperiment'

The following object is masked from 'package:Seurat':

    Assays

The following object is masked from 'package:SeuratObject':

    Assays
Code
library(SingleR)

# plan("multisession", workers = parallelly::availableCores())

# options(future.globals.maxSize = 2000 * 1024^2)

# Set random generator seed to facilitate reproducibility
set.seed(888)

Make up directory structure

Code
for directoryName in \
  output \
  output/figures \
  output/figures/spatial \
  output/figures/nichenetr/ \
  output/figures/nucleus \
  output/rdata \
  output/rdata/nucleus \
  output/de \
  output/tables/nichenetr/ \
  results \
  results/de \
  results/cell_type_annotation

do
    if [ ! -d ${directoryName} ]
    then
    mkdir -p ${directoryName}
    fi
done

Load functions

Functions to use for cell annotation

Code
#' Annotate cells
#'
#' This function annotates cells based on the input sobject.
#'
#' @param sobject The input sobject.
#'
#' @return A Seurat object with the annotated cells.
#'
#' @examples
#' annot_cells(sobject)
#'
annot_cells <- function(sobject) {
    cell_assign <- SingleR::SingleR(
        as.SingleCellExperiment(sobject),
        ref = list(GetAssayData(mouse_lung_ref), mouse_immune, mouse_rna),
        labels = list(
            mouse_lung_ref$cell_type,
            mouse_immune$label.main,
            mouse_rna$label.fine),
        aggr.ref = TRUE)
    sobject$cell_type <- cell_assign$labels
    sobject$cell_score <- cell_assign$scores %>%
        apply(MARGIN = 1, function(x) max(x, na.rm = TRUE))
    return(sobject)
}

#' Function to generate random colors for unique values in a vector
#'
#' This function takes a vector as input and generates random colors for each unique value in the vector.
#' The function uses the rainbow function to generate a set of colors and assigns a random color to each unique value in the input vector.
#' The seed parameter can be used to set the random seed for reproducibility.
#'
#' @param x A vector of values
#' @param seed An optional seed for the random number generator
#' @return A vector of colors, with one color assigned to each unique value in the input vector
#'
#' @examples
#' crazy_cols(c("A", "B", "C", "A", "B", "D"))
#' crazy_cols(c(1, 2, 3, 1, 2, 4))
#' crazy_cols(c("red", "green", "blue", "red", "green", "yellow"))
#'
crazy_cols <- function(x, seed = 1337) {
    set.seed(seed)
    sample(rainbow(n = length(unique(x))))
}

#' Calculate log fold change between two groups
#'
#' This function calculates the log fold change between two groups in a Seurat object.
#'
#' @param sobj A Seurat object
#' @param group_var The variable in the Seurat object that defines the groups
#' @param group_1 The name of the first group to compare
#' @param group_2 The name of the second group to compare
#' @param epsilon A small value to add to the denominator to avoid division by zero
#' @param assay The assay to use for the calculation
#'
#' @return A tibble with two columns: "log_fc" (the log fold change) and "gene" (the gene name)
calc_logfc <- function(sobj,
                       group_var,
                       group_1,
                       group_2,
                       epsilon = 1,
                       assay = "SCT") {
    all_obs_exp <-
        AverageExpression(sobj,
                          group.by = group_var,
                          assays = assay)[[1]] %>%
        as.data.frame()

    log_fc <-
        tibble(log_fc = log2((all_obs_exp[[group_1]] + epsilon) /
                             (all_obs_exp[[group_2]] + epsilon)),
               gene = rownames(all_obs_exp))

    return(log_fc)
}

A function to make output an sortable and copy-able datatable from a dataframe

Code
interactive_dt <- function(df, rownames = FALSE) {
    DT::datatable(data = df,
                  rownames = rownames,
                  extensions = c("FixedColumns",
                                 "Buttons"),
                  options = list(
                    pageLength = 10,
                    scrollX = TRUE,
                    scrollCollapse = TRUE,
                    dom = 'Bfrtip',
                    buttons = c('copy',
                                'csv',
                                'excel')
                  )
    )
}

Functions for recursive clustering

Code
recurluster <- function(sobj,
                        level = 1,
                        max_level = 3,
                        min_cells = 0,
                        data_type = "RNA",
                        do_plots = TRUE,
                        parallel = FALSE,
                        meta_col_base = "clust",
                        harmony = FALSE,
                        harmony_factors = NULL,
                        backup_resolution = 0.1,
                        verbose = TRUE,
                        level_messages = TRUE) {
    if (verbose || level_messages) {
        message("Clustering level ", level)
    }

    # can't do plots while parallelizing
    if (parallel && do_plots) {
        do_plots <- FALSE
        message("Cannot plot while operating in parallel")
    }

    # number of PCs needs to be less than number of cells
    pc_num <- min(50, length(Seurat::Cells(sobj)) - 1)

    # Recluster at course resolution
    temp_sobj <-
        prep_recurl_data(sobj,
                         level = level,
                         data_type = data_type,
                         parallel = parallel,
                         harmony = harmony,
                         harmony_factors = harmony_factors,
                         backup_resolution = backup_resolution,
                         verbose = verbose,
                         pc_num = pc_num)

    # Add new clusters to the Seurat object, at approprate level
    if (level == 1) {
        temp_sobj[[paste0(meta_col_base, "_", level)]] <-
            paste0(meta_col_base, "_", temp_sobj$seurat_clusters)
    } else {
        temp_sobj[[paste0(meta_col_base, "_", level)]] <-
            paste0(temp_sobj[[paste0(meta_col_base, "_", level - 1)]][, 1],
                   ".",
                   temp_sobj$seurat_clusters)
    }

    # Plot the clusters at this level
    if (do_plots) {
        Seurat::Idents(temp_sobj) <- temp_sobj[[paste0(meta_col_base,
                                                       "_",
                                                       level)]]
        print(Seurat::DimPlot(temp_sobj, pt.size = 1) +
            ggplot2::ggtitle(paste0("Level ", level))) +
            ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))
    }

    # If more than one cluster isolate and recurluster each cluster
    # Also stop if we've reached the max level
    if (temp_sobj$seurat_clusters %>%
            as.character() %>%
            as.numeric() %>%
            max() > 0 &&
        level < max_level) {

        if (!parallel) {
            sobj_list <-
                lapply(unique(temp_sobj$seurat_clusters),
                       function(x) re_recurluster(sobject = temp_sobj,
                                                  cluster_num = x,
                                                  level = level,
                                                  min_cells = min_cells,
                                                  data_type = data_type,
                                                  do_plots = do_plots,
                                                  parallel = parallel,
                                                  verbose = verbose,
                                                  harmony = harmony,
                                                  harmony_factors = harmony_factors))
        } else {
            num_cores <- parallel::detectCores()
            sobj_list <-
                parallel::mclapply(unique(temp_sobj$seurat_clusters),
                                   function(x) re_recurluster(sobject = temp_sobj,
                                                              cluster_num = x,
                                                              level = level,
                                                              min_cells = min_cells,
                                                              data_type = data_type,
                                                              do_plots = do_plots,
                                                              parallel = parallel,
                                                              verbose = verbose,
                                                              harmony = harmony,
                                                              harmony_factors = harmony_factors),
                                   mc.cores = num_cores)
        }
        # Merge the subclusters into a single Seurat object and pull metadata
        returned_meta <-
            merge(sobj_list[[1]],
                  sobj_list[2:length(sobj_list)])@meta.data %>%
            dplyr::select(dplyr::starts_with(meta_col_base))

        sobj <- Seurat::AddMetaData(sobj, metadata = returned_meta)
    }

    # Fill in NAs in the clustser columns with parent cluster plus "_0"
    for (clust_col in grep(paste0(meta_col_base, "_"),
                           names(sobj@meta.data),
                           value = TRUE) %>%
            grep("clust_1", ., value = TRUE, invert = TRUE)) {
        clust_num <- stringr::str_remove(clust_col,
                                         paste0(meta_col_base, "_")) %>%
            as.numeric()

        sobj[[clust_col]] <-
            sobj@meta.data %>%
            as.data.frame() %>%
            dplyr::mutate({{ clust_col }} :=
                dplyr::if_else(is.na(get(clust_col)),
                               paste0(get(paste0(meta_col_base, "_",
                                                 clust_num - 1)),
                                      "0"),
                               get(clust_col))) %>%
            dplyr::pull(get(clust_col))
    }
    return(sobj)
}

prep_recurl_data <- function(sobj,
                             level = 1,
                             data_type = "RNA",
                             parallel = FALSE,
                             harmony = FALSE,
                             harmony_factors = NULL,
                             backup_resolution = 0.1,
                             verbose = TRUE,
                             pc_num = 50) {
    # Recluster at course resolution
    reduction <- "pca"

    # Need to check if within the harmony_factors metadata column there are more
    # than one factor. If there are not, skip harmony
    if (harmony) {
        n_harmony_levels <-
            sobj@meta.data[[harmony_factors]] %>%
            unique() %>%
            length()

        if (n_harmony_levels == 1) {
            harmony <- FALSE
            message("Only one level in harmony_factors, skipping harmony for this subcluster")
        }
    }

    if (data_type == "RNA") {
        find_clust_alg <- 1

        sobj <- sobj %>%
            Seurat::FindVariableFeatures(verbose = verbose) %>%
            Seurat::ScaleData(verbose = verbose) %>%
            Seurat::RunPCA(verbose = verbose,
                           npcs = pc_num,
                           approx = FALSE)

        if (harmony) {
            if (is.null(harmony_factors)) {
                stop("Must specify harmony_factors if harmony = TRUE")
            }
            reduction <- "harmony"
            sobj <-
                sobj %>%
                harmony::RunHarmony(group.by.vars = harmony_factors,
                                    verbose = verbose)
        }

        sobj <-
            sobj %>%
            Seurat::FindNeighbors(dims = 1:min(30, pc_num),
                                  reduction = reduction,
                                  verbose = verbose) %>%
            Seurat::RunUMAP(dims = 1:min(10, pc_num),
                            n.neighbors = min(30, pc_num / 2),
                            reduction = reduction,
                            verbose = verbose)

        # this should be pulled out as a function and made common to atac/rna
        if (ncol(sobj) < 60000) {
            opt_res <-
                optimize_silhouette(sobj,
                                    test_res = seq(0.05, 0.3, by = 0.05),
                                    summary_plot = FALSE) %>%
                dplyr::arrange(sil_vals * -1) %>%
                dplyr::pull(res_vals) %>%
                head(n = 1)
        } else {
            opt_res <- backup_resolution
        }

    } else if (data_type == "ATAC") {
        find_clust_alg <- 3

        sobj <-
            sobj %>%
            Signac::RunTFIDF(verbose = verbose) %>%
            Signac::FindTopFeatures(min.cutoff = "q25", verbose = verbose) %>%
            Signac::RunSVD(verbose = verbose) %>%
            Seurat::RunUMAP(reduction = "lsi",
                            dims = 2:min(30, pc_num),
                            verbose = verbose) %>%
            Seurat::FindNeighbors(dims = 1:min(30, pc_num),
                                  reduction = "lsi",
                                  verbose = verbose)
    }
    sobj <- sobj %>%
        Seurat::FindClusters(resolution = opt_res,
                             algorithm = find_clust_alg,
                             verbose = verbose)

    return(sobj)
}

re_recurluster <- function(sobject,
                           cluster_num,
                           level,
                           min_cells,
                           data_type,
                           do_plots,
                           parallel,
                           verbose,
                           harmony = FALSE,
                           harmony_factors = NULL) {
    if (length(which(sobject$seurat_clusters == cluster_num)) > min_cells) {
        if (verbose) {
            message("Processing cluster ",
                    cluster_num,
                    " at level ",
                    level)
        }
        sobj_out <-
            recurluster(subset(sobject,
                                subset = seurat_clusters == cluster_num),
                        level = level + 1,
                        min_cells = min_cells,
                        data_type = data_type,
                        do_plots = do_plots,
                        harmony = harmony,
                        harmony_factors = harmony_factors,
                        parallel = parallel)
    } else {
        sobj_out <-
            subset(sobject,
                   subset = seurat_clusters == cluster_num)
    }
    return(sobj_out)
}

Read in all data and process it for downstream analysis

List of seurat object saved to output/rdata/sobj_list.qs

Get cell type reference datasets

Reference data from GEO#GSE151974 ::: {.cell}

Code
ref_path <- "/home/gdrobertslab/lab/GenRef/sc_ref_datasets/mouse"
mouse_lung_ref <- qs::qread(paste0(ref_path, "/GSE151974/mouse_lung_ref.qs"))
mouse_immune <- celldex::ImmGenData()
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
Code
# This label is not informative
mouse_immune <-
    mouse_immune[mouse_immune$label.main != "Stromal cells", ]

mouse_rna <- celldex::MouseRNAseqData()
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache

:::

Load raw data and suggest cell type

Code
sample_list <- read_tsv("misc/sample_cutoffs.txt", show_col_types = FALSE)

end_path <- "/filtered_feature_bc_matrix"
if (grepl("r1pl", Sys.info()[["nodename"]])) {
    data_path <- "/home/gdrobertslab/lab/Counts/"
    path_use <- "cluster_path"
} else {
    data_path <- "/Applications/scRNA_seq raw files/"
    path_use <- "James_path"
}

sobj_list <- parallel::mclapply(seq_len(nrow(sample_list)),
    mc.cores = parallelly::availableCores(),
    function(i) {
        obj_name <- sample_list$obj_name[i]
        message(obj_name)

        sobj <- tenx_load_qc(
            paste0(data_path,sample_list[[path_use]][i], end_path),
            min_cells = 3,
            min_features = 200,
            violin_plot = FALSE)

        # Add metadata to dataset
        for (colname in colnames(sample_list)) {
            sobj[[colname]] <- sample_list[[colname]][i]
        }

        # Add sample name to dataset
        sobj$sample <- obj_name

        cutoff_table <-
            tribble(~"feature",   ~"min_val",                    ~"max_val",
                    "nCount_RNA", sample_list$ncount_rna_min[i], sample_list$ncount_rna_max[i],
                    "percent.mt", 0,                             sample_list$mt_max[i])

        plotted <- feature_hist(
            sobj,
            features = c("nCount_RNA", "percent.mt"),
            cutoff_table = cutoff_table)

        ggsave(paste0("output/figures/feature_hist_", obj_name, ".png"),
            width = 10,
            height = 10,
            plot = plotted)

        sobj <- sobj %>%
            subset(nCount_RNA   >= sample_list$ncount_rna_min[i] &
                nCount_RNA   <= sample_list$ncount_rna_max[i] &
                percent.mt   <= sample_list$mt_max[i]) %>%
            process_seurat()

        sobj <- annot_cells(sobj)
        return(sobj)
})

names(sobj_list) <- sample_list$obj_name

qs::qsave(sobj_list, file = "output/rdata/sobj_list.qs")

Analyze C57BL/6 and F420 data

B6 and F420

Create the merged tumor/normal object

Process into a shared UMAP space ::: {.cell}

Code
sobj_list <- qs::qread("output/rdata/sobj_list.qs")

b6_f420_combined <- merge(sobj_list[["C57BL6"]],
        y = sobj_list[["F420"]],
        add.cell.ids = c("C57BL6", "F420")) %>%
    SCTransform(verbose = FALSE) %>%
    RunPCA(npcs = 30, verbose = FALSE) %>%
    RunUMAP(dims = 1:30, seed.use = 22, verbose = FALSE) %>%
    FindNeighbors(k.param = 30, reduction = "umap", dims = 1:2, verbose = FALSE) %>%
    FindClusters(resolution  = 0.3, verbose = FALSE)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
qs::qsave(b6_f420_combined, "output_old/rdata/b6_f420_combined_all.qs")

:::

Inspect identities from SingleR calls and other meta

Code
# Check identities from SingleR calls and other meta
r_dim_plot(b6_f420_combined, group.by = "seurat_clusters", split.by = "sample")

Code
b6_f420_combined <- run_fdl(b6_f420_combined, graph = "SCT_snn")
r_dim_plot(b6_f420_combined, group.by = "seurat_clusters", split.by = "sample", reduction = "fdl")

Code
r_dim_plot(b6_f420_combined, group.by = "cell_type", split.by = "sample")

Code
r_feature_plot(b6_f420_combined, features = "cell_score", split.by = "sample")

Code
r_feature_plot(b6_f420_combined, features = "nCount_RNA", split.by = "sample")

Code
r_feature_plot(b6_f420_combined, features = "Col1a1", split.by = "sample")

Code
r_feature_plot(b6_f420_combined, features = "Col1a2", split.by = "sample")

Code
p1 <- r_dim_plot(b6_f420_combined, group.by = "cell_type", repel = TRUE)
p2 <- r_dim_plot(b6_f420_combined, group.by = "seurat_clusters")
p1 | p2
Warning: ggrepel: 48 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Rename clusters based on the calls

Code
if(file.exists("misc/b6_f420_assignments.qs")) {
    b6_f420_combined <- AddMetaData(b6_f420_combined,
        qs::qread("misc/b6_f420_assignments.qs"))
} else {
    b6_f420_combined[[]] %>%
        select(seurat_clusters, cell_type, cell_score) %>%
        arrange(seurat_clusters) %>%
        group_by(seurat_clusters, cell_type) %>%
        summarize(count = n(), score = ave(cell_score)) %>%
        arrange(seurat_clusters, -count) %>%
        unique() %>%
        print(n = 1000)

    # This list is manually curated from the SingleR assignments and scores
    # If run again, this needs to be checked for accuracy, as clusters may change
    b6_f420_combined <- RenameIdents(b6_f420_combined,
        `0` = "Monocyte/Macrophage",
        `1` = "Alveolar macrophage",
        `2` = "Monocyte/Macrophage",
        `3` = "Alveolar macrophage",
        `4` = "Monocyte/Macrophage",
        `5` = "Endothelial cell",
        `6` = "Endothelial cell",
        `7` = "Fibroblast",
        `8` = "Endothelial cell",
        `9` = "F420",
        `10` = "Distal airway cell",
        `11` = "T/NK cell",
        `12` = "T/NK cell",
        `13` = "Upper airway cell",
        `14` = "B cell",
        `15` = "F420",
        `16` = "Monocyte/Macrophage",
        `17` = "Granulocyte",
        `18` = "Dendritic cell",
        `19` = "Dendritic cell",
        `20` = "Upper airway cell",
        `21` = "Dendritic cell",
        `22` = "Pericyte",
        `23` = "T/NK cell",
        `24` = "F420",
        `25` = "T/NK cell",
        `26` = "Monocyte/Macrophage",
        `27` = "Mesothelial",
        `28` = "LowQ",
        `29` = "LowQ",
        `30` = "Alveolar macrophage",
        `31` = "Alveolar macrophage",
        `32` = "LowQ",
        `33` = "Adipocyte",
        `34` = "LowQ",
        `35` = "Adipocyte",
        `36` = "Alveolar macrophage")

    b6_f420_combined$cell_type_final <- Idents(b6_f420_combined)

    p1 <- r_dim_plot(b6_f420_combined, group.by = "cell_type_final", repel = TRUE)
    p2 <- r_dim_plot(b6_f420_combined, group.by = "seurat_clusters")
    p1 | p2

    b6_f420_assignments <- b6_f420_combined[["cell_type_final"]]

    qs::qsave(b6_f420_assignments, file = "misc/b6_f420_assignments.qs")
}

Create publication plots and save

Code
# Remove tumor cells, low quality cells, and erythrocytes
Idents(b6_f420_combined) <- b6_f420_combined$cell_type_final
b6_f420_combined <- subset(b6_f420_combined,
    idents = c("LowQ", "Mesothelial", "F420"),
    invert = TRUE)

# Plot and save
r_dim_plot(b6_f420_combined, "F420 Stroma",
    group.by = "cell_type_final",
    split.by = "sample")

Code
ggsave("output/figures/combined_b6_f420.pdf",
    width = 8,
    height = 6)

qs::qsave(b6_f420_combined, "output/rdata/b6_f420_combined.qs")

mouse_marks <- c(
    "Itgam", "Fcgr1", "Cx3cr1", # Monocyte/Macrophage
    "Cd68", "Itgax", "Mertk", # Alveolar macrophage
    "Pecam1", "Tek", "Ptprb", # Endothelial
    "Cd3d", "Itk", "Tcf7", # T/NK cell
    "Cd209a", "Ccl17", "Etv3", # Dendritic cell
    "Foxl1", "Scgb1a1", "Epcam", # Proximal airway
    "Sftpd", "Ager", "Aqp5", # Distal airway
    "Pdgfra", "Ltbp4", "Gsn", # Fibroblast
    "Cd19", "Cd79a", "Pax5", # B cell
    "Csf3r", "S100a9", "Mxd1", # Granulocyte
    "Apoc1", "Adipoq", "Retn", # Adipocyte
    "Postn", "Pde5a", "Pdzd2" # Pericyte
)

b6_f420_combined$cell_type_final <- factor(
    b6_f420_combined$cell_type_final,
    levels = c("Monocyte/Macrophage", "Alveolar macrophage", "Endothelial cell",
        "T/NK cell", "Dendritic cell", "Upper airway cell",
        "Distal airway cell", "Fibroblast", "Smooth muscle cell", "B cell",
        "Granulocyte", "Adipocyte", "Pericyte"))

DotPlot(b6_f420_combined,
    features = mouse_marks,
    group.by = "cell_type_final",
    cols = "RdBu",
    col.max = 1.5) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Create frequency plots of stromal and immune cells

Code
subset(b6_f420_combined,
    idents = c(
        "Monocyte/Macrophage",
        "Alveolar macrophage",
        "T/NK cell",
        "Dendritic cell",
        "B cell",
        "Granulocyte"))@meta.data %>%
    ggplot(aes(sample, fill = cell_type_final)) +
        geom_bar(position = "fill") +
        theme_classic()

Code
ggsave("output/figures/b6_immune.pdf",
    width = 4,
    height = 4)

subset(b6_f420_combined,
    idents = c(
        "Endothelial cell",
        "Upper airway cell",
        "Distal airway cell",
        "Fibroblast",
        "Adipocyte",
        "Pericyte"))@meta.data %>%
    ggplot(aes(sample, fill = cell_type_final)) +
        geom_bar(position = "fill") +
        theme_classic()

Code
ggsave("output/figures/b6_stroma.pdf",
    width = 4,
    height = 4)

Analyze BALBC and K7M2 data

BALBC and K7M2

Code
sobj_list <- qs::qread("output/rdata/sobj_list.qs")

# Merge normal and tumor bearing lung for F420 and process into same UMAP space
balb_k7m2_combined <- merge(sobj_list[["BALBC"]],
        y = sobj_list[["K7M2"]],
        add.cell.ids = c("BALBC", "K7M2")) %>%
    SCTransform() %>%
    RunPCA(npcs = 30) %>%
    RunUMAP(dims = 1:30, seed.use = 111) %>%
    FindNeighbors(k.param = 30, reduction = "umap", dims = 1:2) %>%
    FindClusters(resolution  = 0.3)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts.1
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 17507 by 9293
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
Found 163 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 17507 genes
Computing corrected count matrix for 17507 genes
Calculating gene attributes
Wall clock passed: Time difference of 30.71296 secs
Determine variable features
Centering data matrix
Getting residuals for block 1(of 2) for 1 dataset
Getting residuals for block 2(of 2) for 1 dataset
Centering data matrix
Finished calculating residuals for 1
Running SCTransform on layer: counts.2
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 18783 by 13436
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
Found 145 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 18783 genes
Computing corrected count matrix for 18783 genes
Calculating gene attributes
Wall clock passed: Time difference of 41.64436 secs
Determine variable features
Centering data matrix
Getting residuals for block 1(of 3) for 2 dataset
Getting residuals for block 2(of 3) for 2 dataset
Getting residuals for block 3(of 3) for 2 dataset
Centering data matrix
Finished calculating residuals for 2
Centering data matrix
Centering data matrix
Set default assay to SCT
Warning in PrepDR(object = object, features = features, verbose = verbose): The
following 47 features requested have not been scaled (running reduction without
them): Cd5l, Ighg2b, Ighg2c, Gm15056, Ighg1, Gzmd, Mcpt2, Tpsab1, Ighv14-3,
Tpsb2, Igkv3-12, Igkv1-88, AW551984, Igkv3-2, Klk9, Gzmf, Igkv4-91, Mcpt1,
Igkv6-25, Igkv6-32, Igkv12-41, Ighv5-12, Gzmg, Krt14, Igkv15-103, Sln,
Igkv17-121, Sprr1a, Cnfn, Chl1, Igkv5-48, Ifnb1, Camp, Nlrp2, Ighv1-14,
Sostdc1, Dynap, Mucl1, Krt16, Tff1, Igkv17-127, Muc5b, Igkv14-126, Crct1, Ang2,
Ighv5-4, Tph1
PC_ 1 
Positive:  Lyz2, Cxcl2, Ccl6, Il1b, Ftl1, Cybb, Apoe, Lpl, Chil3, Ctsd 
       Spp1, C1qa, Cd14, Wfdc17, Plet1, Ctss, Fth1, Tyrobp, Cebpb, Psap 
       C1qb, Arg1, Fcer1g, Ccl4, Lgals3, Laptm5, Ear2, Il1a, C1qc, Ccl9 
Negative:  Igfbp7, Ly6a, Cldn5, Ramp2, Emp2, Wfdc2, Sparc, Ctla2a, Egfl7, Epas1 
       Sftpc, Sftpb, Sftpa1, Calcrl, Sftpd, Adgrf5, Jun, Hspb1, Cdh5, Cavin2 
       Ly6c1, Cxcl15, Gpihbp1, Tmem100, Col4a1, Clec14a, Ptprb, Crip2, Slc34a2, Timp3 
PC_ 2 
Positive:  Wfdc2, Sftpb, Sftpa1, Sftpc, Sftpd, Cxcl15, Slc34a2, Chil1, Napsa, Lcn2 
       Cbr2, Cldn3, Cldn18, Lyz2, Scd1, Ager, Lamp3, Krt8, Krt18, Hc 
       Atp1b1, Muc1, S100g, Lpcat1, Sfta2, Ppp1r14c, Ptprf, Chchd10, Lgi3, Cd74 
Negative:  Cldn5, Ramp2, Ctla2a, Egfl7, Calcrl, Cdh5, Ly6c1, Icam2, Gpihbp1, Igfbp7 
       Tmem100, Clec14a, Ptprb, Hpgd, Cavin2, Pecam1, Tspan7, Kdr, Ly6a, Plvap 
       Scn7a, Epas1, Stmn2, Eng, Aqp1, Ace, Ifitm3, Ecscr, Acvrl1, Clec1a 
PC_ 3 
Positive:  Lyz2, Cxcl2, Ccl6, Cldn5, Ftl1, Ramp2, Apoe, Ly6c1, Epas1, Wfdc17 
       Egfl7, Arg1, Il1b, Ctsd, Pf4, Calcrl, C1qa, Ctsl, Ier3, Emp2 
       Icam2, Lpl, Cdh5, Adgrf5, Spp1, Igfbp7, Cybb, Ly6a, Car4, Cd36 
Negative:  Cd3g, Trbc2, H2-Eb1, Cd74, H2-Aa, Ms4a4b, Trbc1, Nkg7, Igkc, S100a4 
       H2-Ab1, Cd3d, Ccr7, Ccl5, Tnfrsf18, Icos, Tnfrsf4, Cd52, Ikzf2, Ltb 
       Ptprcap, Cd79a, Rps24, Il2rb, Trac, Ighm, Rpl12, Rps27, Rpl13, Vps37b 
PC_ 4 
Positive:  Lpl, Chil3, Plet1, Cybb, Ear2, Lyz2, Fabp1, Abcg1, Ccl6, Atp6v0d2 
       Ear1, Ctsk, Ctsd, Krt79, Wfdc21, Il18, Sirpa, Car4, Gdf15, Slc7a2 
       Sgk1, Mrc1, Laptm5, Ftl1, Slpi, Lgals3, Plin2, Cd9, Olr1, Nceh1 
Negative:  Apoe, Il1b, Cd74, H2-Eb1, H2-Aa, C1qa, H2-Ab1, S100a9, Srgn, G0s2 
       C1qb, S100a8, Cd14, Ifi27l2a, Il1r2, Acod1, Plac8, C1qc, Cxcl2, Cst3 
       Hdc, Ms4a4c, Rsad2, Ccl4, Arg1, Ifitm3, Ier3, Pf4, Lst1, Mxd1 
PC_ 5 
Positive:  Cd74, H2-Eb1, H2-Aa, H2-Ab1, Cldn5, Ctla2a, Ramp2, Egfl7, Calcrl, Epas1 
       Ly6c1, Cdh5, Adgrf5, Napsa, Tmem100, Gpihbp1, Icam2, Clec14a, Ly6a, Pecam1 
       Sftpc, Tspan7, Ptprb, AW112010, Car4, Sftpa1, Hpgd, Sftpb, Kdr, Ace 
Negative:  Bgn, Col1a2, Mgp, S100a6, Gsn, Rarres2, Col3a1, Serping1, Sparc, Col1a1 
       Rbp1, Inmt, Timp3, S100a4, Ankrd1, Mfap4, Sparcl1, Ogn, Igfbp6, Pcolce2 
       Apoe, Aldh1a1, Dcn, Ptgis, Cald1, Lgals1, Ctgf, Cxcl1, Fn1, Pcolce 
17:28:06 UMAP embedding parameters a = 0.9922 b = 1.112
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
17:28:06 Read 22729 rows and found 30 numeric columns
17:28:06 Using Annoy for neighbor search, n_neighbors = 30
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
17:28:06 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
17:28:10 Writing NN index file to temp file /gpfs0/scratch/5855649/RtmpfZhFWt/file1fad301f638ef7
17:28:10 Searching Annoy index using 1 thread, search_k = 3000
17:28:18 Annoy recall = 100%
17:28:18 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
17:28:20 Initializing from normalized Laplacian + noise (using RSpectra)
17:28:21 Commencing optimization for 200 epochs, with 957822 positive edges
17:28:31 Optimization finished
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 22729
Number of edges: 751240

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9816
Number of communities: 38
Elapsed time: 1 seconds
Code
# Check identities from SingleR calls and other meta
DimPlot(balb_k7m2_combined, group.by = "seurat_clusters", split.by = "sample", label = TRUE) +
    coord_fixed() +
    theme(legend.position = "none")

Code
DimPlot(balb_k7m2_combined, group.by = "cell_type", split.by = "sample", label = TRUE) +
    coord_fixed() +
    theme(legend.position = "none")

Code
r_feature_plot(balb_k7m2_combined, features = "cell_score", split.by = "sample") +
    coord_fixed()
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
r_feature_plot(balb_k7m2_combined, features = "nCount_RNA", split.by = "sample") +
    coord_fixed()
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
r_feature_plot(balb_k7m2_combined, features = "Col1a1", split.by = "sample") +
    coord_fixed()
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
r_feature_plot(balb_k7m2_combined, features = "Col1a2", split.by = "sample") +
    coord_fixed()
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
# Rename clusters based on the calls
if(file.exists("misc/balb_k7m2_assignments.qs")) {
    balb_k7m2_combined <- AddMetaData(balb_k7m2_combined,
        qs::qread("misc/balb_k7m2_assignments.qs"))
} else {
    # This list is manually curated from the SingleR assignments and scores
    # If run again, this needs to be checked for accuracy, as clusters may change
    balb_k7m2_combined <- RenameIdents(balb_k7m2_combined,
        `0` = "Monocyte/Macrophage",
        `1` = "Alveolar macrophage",
        `2` = "Alveolar macrophage",
        `3` = "Endothelial cell",
        `4` = "Monocyte/Macrophage",
        `5` = "B cell",
        `6` = "Dendritic cell",
        `7` = "Distal airway cell",
        `8` = "T/NK cell",
        `9` = "T/NK cell",
        `10` = "K7M2",
        `11` = "T/NK cell",
        `12` = "Monocyte/Macrophage",
        `13` = "K7M2",
        `14` = "Granulocyte",
        `15` = "Endothelial cell",
        `16` = "Endothelial cell",
        `17` = "T/NK cell",
        `18` = "Granulocyte",
        `19` = "Monocyte/Macrophage",
        `20` = "Fibroblast",
        `21` = "Smooth muscle cell",
        `22` = "Upper airway cell",
        `23` = "Alveolar macrophage",
        `24` = "Endothelial cell",
        `25` = "Adipocyte",
        `26` = "Alveolar macrophage",
        `27` = "Alveolar macrophage",
        `28` = "LowQ",
        `29` = "Dendritic cell",
        `30` = "LowQ",
        `31` = "LowQ",
        `32` = "Pericyte")

    balb_k7m2_combined$cell_type_final <- Idents(balb_k7m2_combined)

    balb_k7m2_assignments <- data.frame(cell_type_final = balb_k7m2_combined$cell_type_final)
    rownames(balb_k7m2_assignments) <- names(balb_k7m2_combined$cell_type_final)

    qs::qsave(balb_k7m2_assignments, file = "misc/balb_k7m2_assignments.qs")
}

p1 <- r_dim_plot(balb_k7m2_combined, group.by = "cell_type_final")
p2 <- r_dim_plot(balb_k7m2_combined, group.by = "seurat_clusters")
p1 | p2

Code
# Remove tumor cells, low quality cells, and erythrocytes
Idents(balb_k7m2_combined) <- balb_k7m2_combined$cell_type_final
balb_k7m2_combined <- subset(balb_k7m2_combined,
    idents = c("LowQ", "K7M2"),
    invert = TRUE)

# Plot and save
r_dim_plot(balb_k7m2_combined, "K7M2 Stroma",
    group.by = "cell_type_final",
    split.by = "sample")

Code
ggsave("output/figures/combined_balb_k7m2.pdf",
    width = 8,
    height = 6)

qs::qsave(balb_k7m2_combined, "output/rdata/balb_k7m2_combined.qs")

mouse_marks <- c(
    "Itgam", "Fcgr1", "Cx3cr1", # Monocyte/Macrophage
    "Cd68", "Itgax", "Mertk", # Alveolar macrophage
    "Pecam1", "Tek", "Ptprb", # Endothelial
    "Cd3d", "Itk", "Tcf7", # T/NK cell
    "Cd209a", "Ccl17", "Etv3", # Dendritic cell
    "Foxl1", "Scgb1a1", "Epcam", # Proximal airway
    "Sftpd", "Ager", "Aqp5", # Distal airway
    "Pdgfra", "Ltbp4", "Gsn", # Fibroblast
    "Mgp", "Myh11", "Acta2", # Smooth muscle
    "Cd19", "Cd79a", "Pax5", # B cell
    "Csf3r", "S100a9", "Mxd1", # Granulocyte
    "Apoc1", "Adipoq", "Retn", # Adipocyte
    "Postn", "Pde5a", "Pdzd2" # Pericyte
)

balb_k7m2_combined$cell_type_final <- factor(
    balb_k7m2_combined$cell_type_final,
    levels = c("Monocyte/Macrophage", "Alveolar macrophage", "Endothelial cell",
        "T/NK cell", "Dendritic cell", "Upper airway cell",
        "Distal airway cell", "Fibroblast", "Smooth muscle cell", "B cell",
        "Granulocyte", "Adipocyte", "Pericyte"))

DotPlot(balb_k7m2_combined,
    features = mouse_marks,
    group.by = "cell_type_final",
    cols = "RdBu",
    col.max = 1.5) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Code
# Create frequency plots of stromal and immune cells
subset(balb_k7m2_combined,
    idents = c(
        "Monocyte/Macrophage",
        "Alveolar macrophage",
        "T/NK cell",
        "Dendritic cell",
        "B cell",
        "Granulocyte"))@meta.data %>%
    ggplot(aes(sample, fill = cell_type_final)) +
        geom_bar(position = "fill") +
        theme_classic()

Code
ggsave("output/figures/balb_immune.pdf",
    width = 4,
    height = 4)

subset(balb_k7m2_combined,
    idents = c(
        "Endothelial cell",
        "Upper airway cell",
        "Distal airway cell",
        "Fibroblast",
        "Adipocyte",
        "Pericyte"))@meta.data %>%
    ggplot(aes(sample, fill = cell_type_final)) +
        geom_bar(position = "fill") +
        theme_classic()

Code
ggsave("output/figures/balb_stroma.pdf",
    width = 4,
    height = 4)

Analyze macrophage populations within the tumor samples

This includes both metastatic and primary tumor samples Merged object saved to output/rdata/all_sobj.qs The subset macrophages from lung mets and primary tumors are saved to output/rdata/immune_recurl.qs

Make a Seurat object of all the data (not corrected for model)

Code
if(!file.exists("output/rdata/all_sobj.qs")) {
    sobj_list <- qs::qread("output/rdata/sobj_list.qs")
    all_sobj <-
        merge(sobj_list[[1]],
            sobj_list[2:length(sobj_list)],
            add.cell.ids = names(sobj_list)) %>%
            JoinLayers() %>%
        process_seurat()
    qs::qsave(all_sobj, file = "output/rdata/all_sobj.qs")
} else {
    all_sobj <- qs::qread("output/rdata/all_sobj.qs")
}

all_dim <-
    r_dim_plot(all_sobj,
        group.by = c("obj_name", "cell_type"),
        label = TRUE,
        repel = TRUE,
        shuffle = TRUE) +
    NoLegend()

ggsave("output/figures/all_dim.pdf",
       width = 25,
       height = 8)
Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Code
all_dim
Warning: ggrepel: 45 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Correct the data for model

Code
if(!file.exists("output/rdata/all_sobj_integrated.qs")) {
    all_sobj$model <- "BALBC"
    all_sobj$model[
        all_sobj$sample %in% c("C57BL6", "F420", "tibia_F420")
        ] <- "C57BL6"
    all_sobj <- all_sobj %>%
        harmony::RunHarmony(group.by.vars = "model", dims.use = 1:30) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony", dims = 1:30) %>%
        FindClusters(resolution = 0.2)

    qs::qsave(all_sobj, "output/rdata/all_sobj_integrated.qs")
} else {
    all_sobj <- qs::qread("output/rdata/all_sobj_integrated.qs")
}

p1 <- r_dim_plot(all_sobj,
    group.by = "model",
    shuffle = TRUE) +
    theme(legend.position = "right")
p2 <- r_dim_plot(all_sobj,
    group.by = "cell_type",
    shuffle = TRUE,
    repel = TRUE)
p3 <- r_dim_plot(all_sobj,
    group.by = "seurat_clusters")
p1 + p2 + p3
Warning: ggrepel: 60 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Grab just macrophages and sub cluster them

Code
if(!file.exists("output/rdata/output/rdata/macs_subclustered.qs")) {
    if (!file.exists("misc/murine_macs.qs")) {
        mac_ids <- all_sobj@meta.data %>%
            filter(cell_type %in% c(
                "Monocytes",
                "Mono",
                "Int Mf",
                "Macrophages",
                "Macrophages activated",
                "DC1",
                "DC2"
            )) %>%
            rownames()
        qs::qsave(mac_ids, "misc/murine_macs.qs")
    } else {
        mac_ids <- qs::qread("misc/murine_macs.qs")
    }

    macs <- subset(all_sobj, cells = mac_ids) %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA() %>%
        harmony::RunHarmony(group.by.vars = "model", dims.use = 1:30) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony", dims = 1:30) %>%
        FindClusters(resolution = seq(0.1, 0.8, 0.1))

    # clustree::clustree(macs)

    # A resolution of 0.3 looks pretty good
    macs$seurat_clusters <- macs$RNA_snn_res.0.3
    Idents(macs) <- macs$seurat_clusters

    p1 <- r_dim_plot(macs, group.by = "seurat_clusters")
    p2 <- r_dim_plot(macs, group.by = "cell_type")
    p3 <- r_dim_plot(macs, group.by = "sample")
    p1 + p2 + p3

    # Establish groups by tissue type
    macs$tissue <- "unassigned"
    macs$tissue[macs$sample %in% c("tibia_F420", "tibia_K7M2")] <- "Tibia Tumor"
    macs$tissue[macs$sample %in% c("F420", "K7M2")] <- "Lung Metastasis"
    macs$tissue[macs$sample %in% c("C57BL6", "BALBC")] <- "Healthy Lung"

    qs::qsave(macs, "output/rdata/macs_subclustered.qs")
} else {
    macs <- qs::qread("output/rdata/macs_subclustered.qs")
}
Finding variable features for layer counts
Centering and scaling data matrix
Warning: Different features in new layer data than already exists for
scale.data
PC_ 1 
Positive:  Ankrd1, Crabp2, Cald1, Steap2, Pclaf, Rpl39l, Gxylt2, Cks1b, Cavin2, Birc5 
       Prss23, Clca3a1, Bmp4, Pcdh19, Htra1, Gm2694, Nlrp2, Top2a, Amotl2, Gpx8 
       Stmn1, 4930461G14Rik, Ghr, Col1a2, Il1rl1, Cavin3, Apol9a, Dpysl3, Cdca8, Lox 
Negative:  Ctss, Lyz2, Il1b, Ccrl2, Clec4d, Clec4e, Ms4a4c, Fcgr1, Il1rn, Ly6c2 
       Plac8, F10, Ctsb, Ifi27l2a, Osm, Cxcl2, Slfn5, Wfdc17, Thbs1, Fcgr4 
       Ifitm6, Apoe, Ccl6, Msr1, Clec4n, Irf7, Hilpda, Mgst1, Ier3, Slc7a11 
PC_ 2 
Positive:  Tbc1d4, Cst3, H2-Eb1, H2-Aa, P2ry10, Ccr7, Ccl17, Cd74, H2-Ab1, Ccl22 
       Grasp, Etv3, Klrd1, H2-Oa, Napsa, Adam23, Traf1, Il1r2, Serpinb9, Flt3 
       Klrb1b, Cldn1, Tnip3, Rogdi, H2-DMb2, Nr4a3, Plbd1, H2-Ob, Rhof, Stat4 
Negative:  Ctsl, Ctsb, Ctsd, Msr1, Lyz2, Trem2, Apoe, Ccl9, Arg1, Clec4d 
       Ccl6, Ecm1, Fam20c, Fn1, Pf4, Spp1, Timp2, Ier3, Lgals1, Fcgr1 
       Pld3, Il1a, C1qa, C1qc, Lgals3, Pdpn, Folr2, Vat1, Cxcl2, Hal 
PC_ 3 
Positive:  Scgb1a1, Sftpc, Sparc, Igfbp7, Mgp, Chil3, Col1a1, Serpinh1, Bgn, Col3a1 
       Sftpa1, Tm4sf1, Ramp2, Col1a2, Emp2, Cldn5, Cav1, Hbb-bs, Adgrf5, Wfdc2 
       Egfl7, Col4a1, Cald1, Fermt2, Gstm1, Sparcl1, Nfib, Epas1, Gpihbp1, Col6a1 
Negative:  Pclaf, Hist1h2ap, Top2a, Hist1h1b, Cdca8, Nusap1, Mki67, Birc5, Kif15, Ccna2 
       Phf11b, Gm10260, Spc24, Smc2, Hmgb2, Tpm3-rs7, Cdca3, Irf7, Ifi44, Tpx2 
       Hist1h2ae, Cenpe, Gatm, Ifi202b, Stmn1, Oasl1, Ifi27l2a, Dek, Fcgr1, Asf1b 
PC_ 4 
Positive:  Ifi44, Irf7, Isg15, Ifi202b, Ifitm3, Plac8, Rsad2, Phf11b, Slfn4, Ifit1 
       Oasl1, Ifi209, Ms4a4c, Slfn5, Tpm3-rs7, Slfn1, Oas3, Rtp4, Ifi44l, Ifit2 
       Ifit3, Isg20, Mx1, Usp18, Gm10260, Phf11d, Ifi214, Phf11a, Ly6c1, Ifi47 
Negative:  C1qb, C1qc, Trem2, Apoe, Ms4a7, Olfml3, C1qa, Stmn1, Cx3cr1, Fcrls 
       Lmna, Top2a, Hist1h1b, Birc5, Pclaf, Syngr1, Cdca3, Hist1h2ap, Nusap1, Hist1h2ae 
       Ccna2, Lyz2, Cdca8, Tk1, Ube2c, Cd63, Timp2, Tubb5, Spc24, Kif15 
PC_ 5 
Positive:  Plac8, Ms4a4c, Ly6c2, Ifitm3, Isg15, Isg20, Ly6c1, Irf7, Ifit3, Ifit2 
       Rtp4, Ifi27l2a, Ifi44, Slfn1, Rsad2, Ifitm6, Ifit1, Adgre4, Ifit3b, Ifi47 
       Mx1, Treml4, Ace, Sell, Ifit1bl1, Hba-a1, Cmpk2, Hba-a2, Hp, Ifi214 
Negative:  Tnfaip2, Mt1, Vat1, Cd9, Pld3, Ctsd, Pdpn, Rhoc, Cd63, Lipa 
       Tbc1d4, Mt2, Gpnmb, Marcksl1, Nr4a3, Il1a, Tnfsf9, Ccl22, Flrt3, Gadd45b 
       Ecm1, Creg1, Srxn1, Rnf128, Fabp5, Slc7a11, Ccr7, Ahnak, Cxcl2, Fam20c 
Transposing data matrix
Initializing state using k-means centroids initialization
Harmony 1/10
Harmony 2/10
Harmony 3/10
Harmony 4/10
Harmony 5/10
Harmony 6/10
Harmony converged after 6 iterations
18:10:59 UMAP embedding parameters a = 0.9922 b = 1.112
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
18:10:59 Read 24160 rows and found 30 numeric columns
18:10:59 Using Annoy for neighbor search, n_neighbors = 30
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
18:10:59 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
18:11:02 Writing NN index file to temp file /gpfs0/scratch/5855649/RtmpfZhFWt/file1fad302000c726
18:11:02 Searching Annoy index using 1 thread, search_k = 3000
18:11:10 Annoy recall = 100%
18:11:11 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
18:11:13 Initializing from normalized Laplacian + noise (using RSpectra)
18:11:13 Commencing optimization for 200 epochs, with 1062272 positive edges
18:11:24 Optimization finished
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9604
Number of communities: 16
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9427
Number of communities: 19
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9315
Number of communities: 21
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9220
Number of communities: 22
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9153
Number of communities: 23
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9082
Number of communities: 23
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9021
Number of communities: 27
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24160
Number of edges: 890954

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8968
Number of communities: 26
Elapsed time: 4 seconds

Plot the macrophages

Code
if(!file.exists("misc/murine_macs_assignments.qs")) {
    if(!file.exists("output/rdata/macs_subclustered_cleaned.qs")) {
        r_dim_plot(macs, split.by = "tissue")
        macs <- run_fdl(macs)
        r_dim_plot(macs, split.by = "tissue", reduction = "fdl")

        deg_macs_1 <- FindAllMarkers(macs)

        gt::gt(deg_macs_1 %>%
            group_by(cluster) %>%
            filter(pct.1 > 0.2) %>%
            slice_max(avg_log2FC, n = 20) %>%
            arrange(cluster, -avg_log2FC)) %>%
            gt::tab_header(
                "DEGs in murine tumor-associated monouclear cells (before cleaning)")

        # Clean up misclassified clusters
        macs <- subset(macs,
            idents = c(
                6, # Epithelial
                8, # Fibroblast
                9, # Unknown, Tumor?
                10, # Endothelial
                11, # Lymphocytes
                12, # Unknown, Tumor?
                # 13, # Alveolar macs
                16, # Fibroblasts
                17, # Goblet
                18, # Erythrocyte
                19, # Neural
                20 # Mast
            ),
            invert = TRUE)

        # Reprocess and plot
        macs <- macs %>%
            FindVariableFeatures() %>%
            ScaleData() %>%
            RunPCA() %>%
            harmony::RunHarmony(group.by.vars = "model", dims.use = 1:30) %>%
            RunUMAP(reduction = "harmony",
                dims = 1:30,
                min.dist = 0.5,
                spread = 5) %>%
            FindNeighbors(reduction = "harmony", dims = 1:30) %>%
            FindClusters(resolution = seq(0.1, 1, 0.1))

        # A resolution of 1 looks pretty good
        Idents(macs) <- macs$RNA_snn_res.1
        r_dim_plot(macs, split.by = "tissue")

        macs <- run_fdl(macs)
        r_dim_plot(macs, split.by = "tissue", reduction = "fdl")

        qs::qsave(macs, "output/rdata/macs_subclustered_cleaned.qs")
    } else {
        macs <- qs::qread("output/rdata/macs_subclustered_cleaned.qs")
    }

    # Re-characterize the subclustered macs
    deg_macs_2 <- FindAllMarkers(macs)

    gt::gt(deg_macs_2 %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header(
            "DEGs in murine tumor-associated monouclear cells (cleaned)")

    # Label against the murine tumor immune atlas
    # https://doi.org/10.1101/gr.273300.120
    macs_matrix <- LayerData(
        macs,
        assay = "RNA",
        layer = "counts")

    mac_marks <- readxl::read_xlsx(
        "misc/mac_mono_subsets.xlsx") %>%
        filter(Species == "Mouse") %>%
        select(Annotation, ID) %>%
        group_by(Annotation) %>%
        nest() %>%
        as.list()
    names(mac_marks$data) <- mac_marks$Annotation
    mac_marks <- mac_marks$data

    mac_sigs <- lapply(mac_marks, function(x) {
        as.character(x) %>%
            GSEABase::GeneSet()
    })

    mac_scores <- lapply(seq_along(mac_sigs), function(i) {
        AUCell::AUCell_run(macs_matrix, mac_marks[[i]]) %>%
        AUCell::getAUC() %>%
        as.data.frame()
    }) %>%
        bind_rows() %>%
        t() %>%
        as.data.frame()
    names(mac_scores) <- names(mac_marks)

    macs <- AddMetaData(macs, mac_scores)

    mac_scores <- macs[[]] %>%
        select(
            seurat_clusters,
            IFN_TAMs,
            Inflam_TAMs,
            LA_TAMs,
            Angio_TAMs,
            Reg_TAMs,
            Prolif_TAMs,
            RTM_TAMs,
            cTIMs,
            ncMonos) %>%
        group_by(seurat_clusters) %>%
        summarise_all(mean) %>%
        as.data.frame()

    rownames(mac_scores) <- mac_scores[, 1]
    mac_scores <- mac_scores[, -1]
    data.matrix(mac_scores) %>%
        heatmap()

    auc_plots <- lapply(colnames(mac_scores), function(i) {
        r_feature_plot(macs, i, min.cutoff = 0.2, max.cutoff = 0.6)
    })
    wrap_plots(auc_plots, ncol = 4)

    # Look at expression of key marker genes by cluster
    mac_genes <- c(
        "Mmp9", "Ckb", "Ctsk",  # Osteoclast-TAMs, https://doi.org/10.1186/s41232-022-00213-x
        "Trem2", "Cd9", "Gpnmb", "Spp1",  # Scar-associated, see https://doi.org/10.1038/s41586-019-1631-3
        # TAMs, generic
        "Ifit1", "Ifit2", "Cxcl10", # IFN-TAMs
        "Il1a", "Il1b", "Fn1",  # Inflammatory-TAMs
        # "Acp5", "Ctsk", # LA-TAMs
        "Top2a", "Hist1h1b", "Birc5",  # Cycling
        "Marco", "Siglecf", "Itgax", # Alveolar
        "Lyve1", "Mrc1",  # Tissue resident/interstitial
        "S100a8", "Sell", "Ifitm6", "Ly6c2",  # cMonocyte
        "Itgal", "Ace", "Spn", "Cx3cr1", # ncMonocyte
        "Cd209a", "Ccr7", "Dcstamp", # Pre-DC
        "H2-Ab1", "Clec9a", "Irf8",  # DC1
        "Fscn1", "Ccl5") # DC2

    DotPlot(macs,
        features = mac_genes,
        cols = "RdBu",
        col.max = 1.5) +
        scale_y_discrete(limits = rev) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
        coord_fixed()

    r_dim_plot(macs, split.by = "tissue")

    # What distinguishes the big cluster - 0, 1, 5?
    FindMarkers(macs, ident.1 = c(0, 1, 5)) %>%
        filter(pct.1 > 0.5) %>%
        arrange(-avg_log2FC) %>%
        head(n = 20)

    # Name the clusters
    macs2 <- RenameIdents(macs,
        `0` = "Scar-TAMs",
        `1` = "TAMs",
        `2` = "IFN-TAMs",
        `3` = "TAMs",
        `4` = "TAMs",
        `5` = "Inflammatory-TAMs",
        `6` = "DC1",
        `7` = "TAMs",
        `8` = "Pre-DC",
        `9` = "Pre-DC",
        `10` = "Scar-TAMs",
        `11` = "cMonocyte",
        `12` = "Cycling",
        `13` = "cMonocyte",
        `14` = "ncMonocyte",
        `15` = "Pre-DC",
        `16` = "Cycling",
        `17` = "TAMs",
        `18` = "Alveolar",
        `19` = "Interstitial",
        `20` = "Osteoclast-TAMs",
        `21` = "Scar-TAMs",
        `22` = "cMonocyte")

    macs2$macs_assignment <- Idents(macs2) %>%
        factor(levels = c(
            "Osteoclast-TAMs",
            "Scar-TAMs",
            "TAMs",
            "IFN-TAMs",
            "Inflammatory-TAMs",
            "Cycling",
            "Alveolar",
            "Interstitial",
            "cMonocyte",
            "ncMonocyte",
            "Pre-DC",
            "DC1",
            "DC2",
            0:25))

    r_dim_plot(macs2, split.by = "tissue")

    DotPlot(macs2,
        features = mac_genes,
        group.by = "macs_assignment",
        cols = "RdBu",
        col.max = 1.5) +
        scale_y_discrete(limits = rev) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
        coord_fixed()

    qs::qsave(macs2$macs_assignment, "misc/murine_macs_assignments.qs")
} else {
    macs_assignment <- qs::qread("misc/murine_macs_assignments.qs")
    macs <- subset(macs, cells = names(macs_assignment))
    macs$macs_assignment <- macs_assignment %>%
        factor(levels = c(
            "Osteoclast-TAMs",
            "Scar-TAMs",
            "TAMs",
            "IFN-TAMs",
            "Inflammatory-TAMs",
            "Cycling",
            "Alveolar",
            "Interstitial",
            "cMonocyte",
            "ncMonocyte",
            "Pre-DC",
            "DC1"))

    macs <- macs %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA() %>%
        harmony::RunHarmony(group.by.vars = "model", dims.use = 1:30) %>%
        RunUMAP(reduction = "harmony",
            dims = 1:30,
            local.connectivity = 10L,
            spread = 5) %>%
        FindNeighbors(reduction = "harmony", dims = 1:30) %>%
        FindClusters(resolution = 0.8)

    r_dim_plot(macs, split.by = "tissue", group.by = "macs_assignment")

    macs <- run_fdl(macs)
    r_dim_plot(macs,
        split.by = "tissue",
        reduction = "fdl",
        group.by = "macs_assignment")

    qs::qsave(macs, "output/rdata/macs_subclustered_cleaned.qs")
}
Finding variable features for layer counts
Centering and scaling data matrix
Warning: Different features in new layer data than already exists for
scale.data
PC_ 1 
Positive:  Ctsb, Ftl1, Ctsd, Lamp1, Ctsl, Lgmn, Trem2, Lyz2, Apoe, Cstb 
       Capg, Plin2, Cd63, Grn, C3ar1, Cd68, Fth1, Msr1, Clec4d, Ctss 
       Ecm1, Anxa3, Pld3, Abca1, Timp2, Spp1, Syngr1, Ccl9, Fn1, Psap 
Negative:  Cst3, H2-Eb1, H2-Ab1, H2-Aa, P2ry10, Napsa, H2-Oa, Plbd1, Klrd1, Tbc1d4 
       Ccl17, Il1r2, Grasp, Ckb, Cd74, Cldn1, Itgae, Klrb1b, H2-DMb2, H2-DMb1 
       Ccl22, Epcam, Adam23, H2-Ob, Etv3, Wdfy4, H2afz, Lsp1, Cytip, Kit 
PC_ 2 
Positive:  Plac8, Ifitm3, Ms4a4c, Irf7, Ifi44, Isg15, Samhd1, Ly6c1, Ly6c2, AY036118 
       Slfn4, Rsad2, Ifitm6, Slfn5, Ifi27l2a, Zbp1, Ifi202b, Slfn1, Isg20, Mxd1 
       Ifi209, Rtp4, Ifit1, Phf11b, Ifi44l, Oas3, Lst1, Oasl1, Ifit2, Ifit3 
Negative:  Stmn1, Pclaf, Top2a, Birc5, Tubb5, Cdca3, Mki67, Nusap1, Hist1h1b, Cdca8 
       Ccna2, Hist1h2ap, C1qb, Spc24, Smc2, Prc1, C1qc, Hist1h2ae, Cdk1, Ube2c 
       Cks1b, Tk1, Kif11, C1qa, Timp2, Lmna, Cenpe, Tuba1b, Cenpf, Cd63 
PC_ 3 
Positive:  Isg15, Irf7, Ifi44, Pclaf, Birc5, Top2a, Ifi202b, Ms4a4c, Ifitm3, Hist1h2ap 
       Hist1h1b, Cdca8, Ifit3, Nusap1, Rsad2, Phf11b, Rtp4, Ifi27l2a, Oasl1, Fcgr1 
       Cdca3, Ifit2, Spc24, Stmn1, Zbp1, Hist1h2ae, Ccna2, Ifit1, Mndal, Prc1 
Negative:  Scgb1a1, Cd9, Sqstm1, Tnfaip2, Txnrd1, Gpnmb, Ubc, Tppp3, Slc7a11, Sftpc 
       Gstm1, Mpc1, Rpl29, Srxn1, Pdlim1, Crip1, Tnfsf9, Cd83, Ahnak2, Ahnak 
       Rgcc, Pld3, Lsp1, Cd2, Cd14, Vat1, Sh3bp5, Chil3, Cav1, Lyz2 
PC_ 4 
Positive:  Cx3cr1, Lst1, Zfp36l2, Gm34084, Ms4a7, Mafb, Selenop, Ighm, Ntpcr, Fos 
       Tmem176b, Rpl29, Ace, Adgre4, Tmem119, Jun, Klf2, Pou2f2, Fcrls, Eno3 
       Col1a1, Apoe, Clec4a1, Tmem176a, Slc9b2, Treml4, C1qb, Dusp6, C1qa, Itga6 
Negative:  Il1a, Cxcl2, Tnfaip2, Nlrp3, Malt1, Nfkbia, Ptgs2, Anxa2, Crip1, Gadd45b 
       Il1rn, Mt1, Flrt3, Hilpda, Bcl2a1d, Pdpn, Bcl2a1b, Txnrd1, Il7r, Tnfaip3 
       Spp1, Il1b, Ccrl2, Marcksl1, Cxcl1, Bst1, Bhlhe40, Mki67, Nr4a3, S100a10 
PC_ 5 
Positive:  C1qa, C1qc, Ccl8, C1qb, Cadm1, Ms4a7, Ptms, Cd81, Timp2, Ccl12 
       Irf8, Cst3, Pf4, Mrc1, Cd63, H2-Eb1, Gas6, Folr2, Selenop, Hpgds 
       Fcgrt, Vcam1, H2-Ab1, Pmepa1, Ckb, Grn, H2-Aa, Cd72, Itgae, Zmynd15 
Negative:  Ace, Adgre4, Itgal, Treml4, Spn, Ifitm6, Clec4e, Hp, Ceacam1, Gsr 
       Trem1, Rasgrp2, Slc16a3, Trem3, Eno3, Pglyrp1, Cd300e, Plac8, S100a4, Tgm2 
       Adgre5, Samsn1, Pou2f2, Stk10, Thbs1, Cdc42ep2, Cd300ld, Cybb, F10, Cks2 
Transposing data matrix
Initializing state using k-means centroids initialization
Harmony 1/10
Harmony 2/10
Harmony 3/10
Harmony 4/10
Harmony 5/10
Harmony converged after 5 iterations
18:17:49 UMAP embedding parameters a = 0.1102 b = 0.8526
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
18:17:49 Read 18884 rows and found 30 numeric columns
18:17:49 Using Annoy for neighbor search, n_neighbors = 30
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
18:17:49 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
18:17:51 Writing NN index file to temp file /gpfs0/scratch/5855649/RtmpfZhFWt/file1fad30653c91e2
18:17:51 Searching Annoy index using 1 thread, search_k = 3000
18:17:57 Annoy recall = 100%
18:17:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
18:17:59 18884 smooth knn distance failures
18:18:00 Initializing from normalized Laplacian + noise (using RSpectra)
18:18:00 Commencing optimization for 200 epochs, with 301290 positive edges
18:18:14 Optimization finished
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 18884
Number of edges: 669054

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8569
Number of communities: 21
Elapsed time: 2 seconds
Code
p1 <- r_dim_plot(macs, group.by = "seurat_clusters")
p2 <- r_dim_plot(macs, group.by = "cell_type")
p3 <- r_dim_plot(macs, group.by = "sample", shuffle = TRUE)
p1 + p2 + p3

Code
r_dim_plot(macs, split.by = "tissue", group.by = "macs_assignment")

Code
# Refine the markers for the macrophage cell types
mac_genes <- c(
    "Mmp9", "Ctsk", "Nfatc1", # Osteoclast-TAMs, https://doi.org/10.1186/s41232-022-00213-x
    "Cd9", "Trem2", "Spp1", "Gpnmb", # Scar-associated, see https://doi.org/10.1038/s41586-019-1631-3
    "Ms4a7", "Selenop", # TAMs, generic
    "Ifit1", "Ifit2", "Cxcl10", # IFN-TAMs
    "Il1a", "Il1b", "Fn1",  # Inflammatory-TAMs
    "Top2a", "Hist1h1b", "Birc5",  # Cycling
    "Ly6c2", "Sell", "Ifitm6",  # cMonocyte
    "Itgal", "Ace", "Spn", # ncMonocyte
    "Cd209a", "Ccr7", "Dcstamp", # Pre-DC
    "Xcr1", "Clec9a", "Irf8") # DC1

Idents(macs) <- macs$macs_assignment

macs2 <- subset(macs,
    macs_assignment %in% c("Interstitial", "Alveolar"),
    invert = TRUE)

DotPlot(macs2,
    features = mac_genes,
    group.by = "macs_assignment",
    cols = "RdBu",
    col.max = 1.5) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Code
r_dim_plot(macs2, split.by = "tissue")

Code
cluster_counts <-
    table(macs2$obj_name, macs2$macs_assignment) %>%
    as.data.frame() %>%
    dplyr::rename("Sample" = "Var1",
                  "Cluster" = "Var2",
                  "Count" = "Freq") %>%
    group_by(Cluster) %>%
    arrange(desc(Count), .by_group = TRUE) %>%
    ggplot(aes(y = Cluster, x = Sample, fill = log10(Count))) +
    geom_tile() +
    geom_text(aes(label = Count), color = "white")

cluster_counts

Code
# Save the final object
qs::qsave(macs, "output/rdata/final_murine_macs.qs")

Run GSEA on the macrophage populations

Define pathway categories

Code
cat_tib <-
    tribble(~category,          ~subcat,    ~label,
            "C2",               "CP:KEGG",  "KEGG",
            "C3",               "TFT:GTRD", "Transcription factors",
            "C5",               "GO:BP",    "GO Biological Process",
            "C5",               "GO:MF",    "GO Molecular Function",
            "C8",               "",         "Cell type")

Run GSEA on all macs between primary and met

Code
immune_recurl <- qs::qread("output/rdata/immune_recurl.qs")

tumor_only <-
    subset(immune_recurl,
           obj_name %in% c("K7M2", "F420", "tibia_F420", "tibia_K7M2"))

tumor_only$comp <-
    if_else(tumor_only$obj_name == "tibia_F420" |
            tumor_only$obj_name == "tibia_K7M2",
            "Primary",
            "Met")

qs::qsave(tumor_only,
          file = "output/rdata/macs_tumor_only.qs")

Calculate logfc using same method as Seurat

This is much faster than FindMarkers() and gives the same results ::: {.cell}

Code
prim_met_logfc <-
    calc_logfc(tumor_only,
               group_var = "comp",
               group_1 = "Primary",
               group_2 = "Met") %>%
    arrange(desc(log_fc)) %>%
    pull(log_fc, name = gene)
As of Seurat v5, we recommend using AggregateExpression to perform pseudo-bulk analysis.
This message is displayed once per session.
Code
gsea_out <-
    parallel::mclapply(seq_len(nrow(cat_tib)),
                       mc.cores = 5,
                       function(i) {
        message(cat_tib$label[i])
        kegg_ref <-
            msigdbr::msigdbr(species = "Mus musculus",
                            category = cat_tib$category[i],
                            subcategory = cat_tib$subcat[i]) %>%
                split(x = .$gene_symbol, f = .$gs_name)

        fgsea::fgseaMultilevel(kegg_ref,
                               prim_met_logfc,
                               minSize = 15,
                               maxSize = 500,
                               nPerm = 1000) %>%
            arrange(desc(NES)) %>%
            mutate(pathway = str_replace_all(pathway, "_", " "),
                   category = cat_tib$category[i],
                   sub_cat = cat_tib$subcat[i],
                   cat_label = cat_tib$label[i]) %>%
            filter(padj < 0.05)
    }) %>%
    bind_rows()

:::

Plot GSEA output

Code
top_paths <-
    gsea_out %>%
    group_by(cat_label, NES > 0) %>%
    arrange(desc(abs(NES)), .by_group = TRUE) %>%
    slice_head(n = 10) %>%
    pull(pathway) %>%
    str_wrap(width = 80)

gsea_dots <-
    gsea_out %>%
    filter(pathway %in% top_paths) %>%
    mutate(pathway = fct_reorder(pathway, NES)) %>%
    ggplot(aes(y = pathway,
               x = NES,
               size = -log10(padj))) +
    geom_point() +
    theme(axis.text.y = element_text(size = 4))

ggsave("output/figures/gsea_dots_macs_prim_met.pdf",
       width = 8,
       height = 15,
       plot = gsea_dots)

GSEA comparing met to primary for each cluster

Calculate logfc between met and primary for each cluster

Code
min_cell_comp <- 10
all_logfc <-
    sapply(unique(tumor_only$clust_2),
           USE.NAMES = TRUE,
           function(x) {
        test_data <-
            subset(tumor_only,
                   clust_2 == x)

        if (min(table(test_data$comp)) > min_cell_comp &&
            length(unique(test_data$comp)) == 2) {
            output <-
                calc_logfc(test_data,
                            group_var = "comp",
                            group_1 = "Primary",
                            group_2 = "Met") %>%
                arrange(desc(log_fc)) %>%
                pull(log_fc, name = gene)
        } else {
            output <- NULL
        }
        return(output)
    })

all_logfc <- all_logfc[!all_logfc %in% list(NULL)]

Run the GSEA

Code
set.seed(1337)

gsea_out <-
    parallel::mclapply(seq_len(nrow(cat_tib)),
                       mc.cores = 5,
                       function(i) {
        message(cat_tib$label[i])
        kegg_ref <-
            msigdbr::msigdbr(species = "Mus musculus",
                            category = cat_tib$category[i],
                            subcategory = cat_tib$subcat[i]) %>%
                split(x = .$gene_symbol, f = .$gs_name)

            parallel::mclapply(names(all_logfc),
                            mc.cores = 5,
                            function(x) {
                fgsea::fgseaMultilevel(kegg_ref,
                                       all_logfc[[x]],
                                       minSize = 15,
                                       maxSize = 500,
                                       nPerm = 1000) %>%
                    arrange(desc(NES)) %>%
                    mutate(pathway = str_replace_all(pathway, "_", " "),
                           cluster = x,
                           category = cat_tib$category[i],
                           sub_cat = cat_tib$subcat[i],
                           cat_label = cat_tib$label[i]) %>%
                    filter(padj < 0.05)
            }) %>%
            bind_rows()
    }) %>%
    bind_rows()

Plot met-primary GSEA output

Code
lapply(unique(gsea_out$cat_label),
       function(x) {
    top_paths <-
        gsea_out %>%
        filter(cat_label == x) %>%
        group_by(cluster) %>%
        arrange(desc(abs(NES)), .by_group = TRUE) %>%
        slice_head(n = 20) %>%
        pull(pathway) %>%
        str_wrap(width = 80)
    gsea_out %>%
        filter(pathway %in% top_paths) %>%
        pivot_wider(names_from = pathway,
                    values_from = NES,
                    id_cols = c(cluster),
                    values_fill = 0) %>%
        column_to_rownames("cluster") %>%
        t() %>%
        pheatmap::pheatmap(main = x,
                           filename = paste0("output/figures/gsea_",
                                             x,
                                             ".pdf"),
                           width = 8,
                           height = 12,
                           fontsize_row = 6)
    })
[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

Confirm directionality of genes from GSEA output

Make plot to make sure I have the directionality of the DE genes correct ::: {.cell}

Code
kegg_ribo_genes <-
    msigdbr::msigdbr(species = "Mus musculus",
                     category = "C2",
                     subcategory = "KEGG") %>%
    filter(gs_name == "KEGG_RIBOSOME") %>%
    split(x = .$gene_symbol, f = .$gs_name)

tumor_only <-
    AddModuleScore(tumor_only,
                   features = kegg_ribo_genes,
                   name = "kegg_ribo_module")
Warning: The following features are not present in the object: Rpl10l, not
searching for symbol synonyms
Code
tumor_only %>%
    subset(clust_2 %in% c("clust_2.0", #high
                          "clust_2.1", #high
                          "clust_0.1", #low
                          "clust_5.1")) %>% #low
    VlnPlot(split.by = "sample",
            features = "kegg_ribo_module1",
            group.by = "clust_2")
The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.

Code
FeaturePlot(immune_recurl,
            features = kegg_ribo_genes[[1]])
Warning: The following requested variables were not found: Rpl10l

Code
tumor_only %>%
    AddMetaData(metadata = paste(tumor_only$clust_2, tumor_only$comp),
                col.name = "clust_comp") %>%
    subset(clust_2 %in% c("clust_2.0", #high
                          "clust_2.1", #high
                          "clust_0.1", #low
                          "clust_5.1")) %>% #low
    DotPlot(features =  unique(kegg_ribo_genes[[1]]),
            group.by = "clust_comp")
Warning: The following requested variables were not found: Rpl10l

:::

Run GSEA to compare each cluster to all others

Calculate logfc between each cluster and all others

Code
all_logfc_clust <-
    sapply(unique(tumor_only$clust_2),
           USE.NAMES = TRUE,
           function(x) {
        test_data <- tumor_only
        test_data$comp <-
            if_else(test_data$clust_2 == x,
                    x,
                    "other")

        output <-
            FoldChange(test_data,
                       group.by = "comp",
                       ident.1 = x,
                       ident.2 = "other") %>%
            arrange(desc(avg_log2FC)) %>%
            rownames_to_column("gene") %>%
            pull(avg_log2FC, name = "gene")
        return(output)
    })

Run the GSEA

Code
set.seed(1337)
gsea_out_clust <-
    parallel::mclapply(seq_len(nrow(cat_tib)),
                       mc.cores = 5,
                       function(i) {
        message(cat_tib$label[i])
        kegg_ref <-
            msigdbr::msigdbr(species = "Mus musculus",
                            category = cat_tib$category[i],
                            subcategory = cat_tib$subcat[i]) %>%
                split(x = .$gene_symbol, f = .$gs_name)

            parallel::mclapply(names(all_logfc),
                            mc.cores = 5,
                            function(x) {
                fgsea::fgseaMultilevel(kegg_ref,
                                       all_logfc[[x]],
                                       minSize = 15,
                                       maxSize = 500,
                                       nPerm = 1000) %>%
                    arrange(desc(NES)) %>%
                    mutate(pathway = str_replace_all(pathway, "_", " "),
                           cluster = x,
                           category = cat_tib$category[i],
                           sub_cat = cat_tib$subcat[i],
                           cat_label = cat_tib$label[i]) %>%
                    filter(padj < 0.05)
            }) %>%
            bind_rows()
    }) %>%
    bind_rows()

Plot GSEA output

Code
lapply(unique(gsea_out_clust$cat_label),
       function(x) {
    top_paths <-
        gsea_out_clust %>%
        filter(cat_label == x) %>%
        group_by(cluster) %>%
        arrange(desc(abs(NES)), .by_group = TRUE) %>%
        slice_head(n = 20) %>%
        pull(pathway) %>%
        str_wrap(width = 80)
    gsea_out_clust %>%
        filter(pathway %in% top_paths) %>%
        pivot_wider(names_from = pathway,
                    values_from = NES,
                    id_cols = c(cluster),
                    values_fill = 0) %>%
        column_to_rownames("cluster") %>%
        t() %>%
        pheatmap::pheatmap(main = x,
                           filename = paste0("output/figures/gsea_by_cluster_",
                                             x,
                                             ".pdf"),
                           width = 8,
                           height = 12,
                           fontsize_row = 6)
    })
[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

Run analyses on the murine stroma cells

Epithelial cells from both K7M2 and F480 models

Code
# Load both models, merge data, and integrate by model
if(!file.exists("output/rdata/murine_aec_post-filtering.qs")) {
    if(!file.exists("output/rdata/murine_aec_pre-filtering.qs")) {
        balb_k7m2_combined <- qs::qread("output/rdata/balb_k7m2_combined.qs")
        b6_f420_combined <- qs::qread("output/rdata/b6_f420_combined.qs")

        murine_aec <- merge(
            subset(balb_k7m2_combined, idents = "Lower Airway"),
            subset(b6_f420_combined, idents = "Lower Airway"))
        DefaultAssay(murine_aec) <- "RNA"

        murine_aec$model <- murine_aec$sample
        murine_aec$model[murine_aec$model == "F420"] <- "C57BL6"
        murine_aec$model[murine_aec$model == "K7M2"] <- "BALBC"
        murine_aec$type <- murine_aec$sample
        murine_aec$type[murine_aec$type == "F420"] <- "Metastasis"
        murine_aec$type[murine_aec$type == "K7M2"] <- "Metastasis"
        murine_aec$type[murine_aec$type == "C57BL6"] <- "Healthy"
        murine_aec$type[murine_aec$type == "BALBC"] <- "Healthy"

        murine_aec <- DietSeurat(murine_aec, assays = "RNA") %>%
            FindVariableFeatures() %>%
            ScaleData() %>%
            RunPCA(npcs = 30, vebose = FALSE) %>%
            harmony::RunHarmony(group.by.vars = c("model")) %>%
            RunUMAP(reduction = "harmony", dims = 1:30) %>%
            FindNeighbors(reduction = "umap", dims = 1:2) %>%
            FindClusters(resolution = 0.2)

        r_dim_plot(murine_aec)
        r_dim_plot(murine_aec, group.by = "sample", shuffle = TRUE)
        r_dim_plot(murine_aec, group.by = "type", shuffle = TRUE)

        # Remove contaminating non-epithelial and immune cells
        murine_aec_marks <- FindAllMarkers(murine_aec)

        gt::gt(murine_aec_marks %>%
            group_by(cluster) %>%
            slice_max(avg_log2FC, n = 20) %>%
            arrange(cluster, -avg_log2FC)) %>%
            gt::tab_header("DEGs in human AECs (by cluster)")

        r_dim_plot(murine_aec, group.by = "cell_type", repel = TRUE)

        qs::qsave(murine_aec,
            "output/rdata/murine_aec_pre-filtering.qs")
    } else {
        murine_aec <- qs::qread("output/rdata/murine_aec_pre-filtering.qs")
    }
    murine_aec <- subset(murine_aec,
        idents = c("4", "5", "9", "10", "11", "12"),
        invert = TRUE)
    murine_aec <- subset(murine_aec,
        subset = cell_type == c("Mono", "Int Mf", "Alv Mf"),
        invert = TRUE)

    # Repeat processing with the cleaned-up dataset
    murine_aec <- murine_aec %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 30, vebose = FALSE) %>%
        harmony::RunHarmony(group.by.vars = c("model")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30, seed.use = 222) %>%
        FindNeighbors(reduction = "umap", dims = 1:2) %>%
        FindClusters(resolution = 0.15)

    r_dim_plot(murine_aec)
    r_dim_plot(murine_aec, group.by = "cell_type", repel = TRUE)

    # Still some upper airway cells carried over--remove them
    murine_aec <- subset(murine_aec, idents = "4", invert = TRUE)

    # Repeat processing with the cleaned-up dataset
    murine_aec <- murine_aec %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 30, vebose = FALSE) %>%
        harmony::RunHarmony(group.by.vars = c("model")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30, seed.use = 888) %>%
        FindNeighbors(reduction = "umap", dims = 1:2) %>%
        FindClusters(resolution = 0.15)

    r_dim_plot(murine_aec)
    r_dim_plot(murine_aec, group.by = "cell_type", repel = TRUE)

    qs::qsave(murine_aec,
        "output/rdata/murine_aec_post-filtering.qs")
} else {
    murine_aec <- qs::qread("output/rdata/murine_aec_post-filtering.qs")
}

r_dim_plot(murine_aec)

Code
r_dim_plot(murine_aec, group.by = "sample", shuffle = TRUE)

Code
r_dim_plot(murine_aec, group.by = "type", shuffle = TRUE)

Code
r_dim_plot(murine_aec, group.by = "cell_type", repel = TRUE)

Code
murine_aec_marks_2 <- FindAllMarkers(murine_aec)
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Calculating cluster 6
Code
gt::gt(murine_aec_marks_2 %>%
    group_by(cluster) %>%
    slice_max(avg_log2FC, n = 20) %>%
    arrange(cluster, -avg_log2FC)) %>%
    gt::tab_header("DEGs in human AECs (by cluster)")
DEGs in human AECs (by cluster)
p_val avg_log2FC pct.1 pct.2 p_val_adj gene
0
5.541374e-12 4.736622 0.064 0.002 1.200317e-07 Batf2
5.585916e-07 4.348431 0.053 0.008 1.209965e-02 Meg3
3.144485e-03 4.156957 0.120 0.184 1.000000e+00 Hbb-bs
8.484552e-10 4.073922 0.056 0.004 1.837839e-05 6330403K07Rik
1.535815e-41 4.018127 0.275 0.025 3.326729e-37 BC030867
2.018068e-03 3.994947 0.015 0.001 1.000000e+00 Sprr2a3
5.931362e-16 3.772867 0.109 0.011 1.284792e-11 Gm8113
1.261507e-42 3.769141 0.272 0.021 2.732549e-38 Ypel4
9.552180e-10 3.716149 0.056 0.004 2.069098e-05 Gbp10
9.615211e-06 3.683296 0.023 0.000 2.082751e-01 1600014C23Rik
1.691137e-13 3.610533 0.087 0.007 3.663172e-09 Sstr2
3.239853e-03 3.588274 0.010 0.000 1.000000e+00 Scml4
3.239853e-03 3.576033 0.010 0.000 1.000000e+00 Atp1b2
3.239853e-03 3.385071 0.010 0.000 1.000000e+00 Tnk2os
3.070753e-24 3.266213 0.181 0.021 6.651557e-20 Gdpd2
3.239853e-03 3.258709 0.010 0.000 1.000000e+00 Kcne1l
3.221719e-04 3.247627 0.023 0.002 1.000000e+00 Gm16754
9.615211e-06 3.192686 0.023 0.000 2.082751e-01 Mettl7b
1.221563e-05 3.185739 0.031 0.002 2.646028e-01 Slc16a4
3.050274e-28 3.139898 0.214 0.026 6.607199e-24 St8sia6
1
2.184753e-31 5.650829 0.154 0.002 4.732394e-27 Ddx3y
4.325329e-30 5.245884 0.148 0.002 9.369096e-26 Eif2s3y
3.642497e-16 4.917855 0.077 0.001 7.890012e-12 Uty
1.680194e-09 4.858046 0.044 0.001 3.639469e-05 Kdm5d
5.932863e-05 4.075664 0.018 0.000 1.000000e+00 AY512931
2.483163e-04 4.033768 0.015 0.000 1.000000e+00 AC123724.1
2.178849e-34 3.945275 0.263 0.036 4.719605e-30 Cst8
2.375856e-21 3.918342 0.195 0.036 5.146342e-17 Pon1
4.824532e-05 3.885049 0.030 0.003 1.000000e+00 Kcnk3
5.932863e-05 3.840875 0.018 0.000 1.000000e+00 Gm37885
2.202719e-03 3.808520 0.021 0.003 1.000000e+00 Sox2
1.054110e-03 3.772483 0.012 0.000 1.000000e+00 Ly9
1.054110e-03 3.743596 0.012 0.000 1.000000e+00 Pou4f1
4.849718e-04 3.716285 0.018 0.001 1.000000e+00 Gm37634
1.054110e-03 3.633946 0.012 0.000 1.000000e+00 Myh11
1.901229e-06 3.611582 0.030 0.001 4.118252e-02 Fez1
5.065072e-05 3.561362 0.033 0.004 1.000000e+00 Aox4
1.942165e-03 3.558802 0.024 0.004 1.000000e+00 Klkb1
2.399516e-21 3.541194 0.163 0.021 5.197592e-17 Slc10a6
1.443651e-20 3.427044 0.148 0.017 3.127093e-16 Fabp12
2
1.283190e-03 5.574496 0.010 0.000 1.000000e+00 Apod
1.657503e-11 5.571988 0.055 0.002 3.590318e-07 Pyy
2.053796e-09 5.395828 0.050 0.003 4.448727e-05 Psca
1.283190e-03 5.360392 0.010 0.000 1.000000e+00 Gm3776
2.303831e-08 5.359085 0.030 0.000 4.990328e-04 Cwh43
3.104525e-03 5.160577 0.020 0.003 1.000000e+00 Brsk1
7.560888e-03 4.890072 0.015 0.002 1.000000e+00 Gm16365
5.161277e-06 4.820284 0.020 0.000 1.117984e-01 Sctr
1.283190e-03 4.815814 0.010 0.000 1.000000e+00 Gm33104
1.762639e-10 4.807519 0.055 0.003 3.818052e-06 Il23a
4.398925e-56 4.778773 0.423 0.053 9.528511e-52 Cldn4
3.424746e-07 4.754732 0.025 0.000 7.418343e-03 Odaph
1.283190e-03 4.689670 0.010 0.000 1.000000e+00 Gm26513
7.433167e-06 4.648595 0.025 0.001 1.610098e-01 Tnfrsf22
9.504260e-12 4.647243 0.134 0.028 2.058718e-07 Ltbp2
3.424746e-07 4.644547 0.025 0.000 7.418343e-03 Hnf4a
1.028469e-04 4.590917 0.020 0.001 1.000000e+00 Adamts7
3.670336e-11 4.576747 0.070 0.006 7.950316e-07 G930009F23Rik
1.283190e-03 4.569136 0.010 0.000 1.000000e+00 4932438H23Rik
1.283190e-03 4.558208 0.010 0.000 1.000000e+00 Gm34059
3
2.715218e-03 5.552258 0.014 0.001 1.000000e+00 Folr1
6.682744e-05 5.089513 0.021 0.001 1.000000e+00 Iglv1
1.537829e-06 5.036642 0.021 0.000 3.331091e-02 Timm8a2
9.499687e-12 4.750228 0.056 0.002 2.057727e-07 Bcas3os2
3.212786e-08 4.690474 0.035 0.001 6.959217e-04 Zmynd12
8.777405e-05 4.530361 0.014 0.000 1.000000e+00 Golt1a
8.777405e-05 4.499046 0.014 0.000 1.000000e+00 Pabpc2
1.537829e-06 4.495782 0.021 0.000 3.331091e-02 Smyd1
8.777405e-05 4.194441 0.014 0.000 1.000000e+00 Gm553
1.537829e-06 4.188504 0.021 0.000 3.331091e-02 Zdhhc19
1.818920e-09 4.152536 0.056 0.004 3.939963e-05 Pnliprp1
2.715218e-03 4.137747 0.014 0.001 1.000000e+00 Cryba1
8.777405e-05 4.119858 0.014 0.000 1.000000e+00 Chst4
8.777405e-05 4.099140 0.014 0.000 1.000000e+00 Dydc1
1.423950e-04 3.984451 0.028 0.003 1.000000e+00 Bcl2l15
5.662824e-06 3.963174 0.035 0.003 1.226624e-01 Fam181a
8.777405e-05 3.938273 0.014 0.000 1.000000e+00 Gm29542
2.983336e-03 3.779285 0.021 0.003 1.000000e+00 Gm14322
8.777405e-05 3.770105 0.014 0.000 1.000000e+00 Emx2
2.715218e-03 3.746976 0.014 0.001 1.000000e+00 Gm29776
4
2.786747e-124 7.932019 0.592 0.009 6.036372e-120 Knl1
6.317385e-91 7.718180 0.394 0.003 1.368409e-86 Depdc1a
1.062844e-41 7.594669 0.155 0.000 2.302226e-37 Hist1h2af
7.207610e-123 7.438662 0.577 0.009 1.561240e-118 Ckap2l
3.581883e-87 7.425394 0.380 0.003 7.758717e-83 Mxd3
6.874478e-83 7.328732 0.394 0.006 1.489081e-78 Sapcd2
1.986381e-103 7.288986 0.493 0.008 4.302700e-99 Aspm
4.868580e-49 7.253169 0.183 0.000 1.054583e-44 Pif1
1.241872e-93 7.223066 0.437 0.006 2.690019e-89 Sgo1
2.109071e-143 7.186825 0.732 0.016 4.568458e-139 Cenpe
1.276905e-56 7.075396 0.239 0.002 2.765905e-52 Prr11
2.201711e-46 6.990424 0.211 0.003 4.769125e-42 Efcab11
3.254199e-121 6.987737 0.563 0.008 7.048920e-117 Plk1
1.527981e-89 6.825270 0.746 0.057 3.309760e-85 Ube2c
7.827085e-61 6.810393 0.268 0.003 1.695425e-56 Hist1h2ab
9.181553e-94 6.766417 0.507 0.013 1.988816e-89 Spag5
2.894169e-131 6.732935 0.592 0.007 6.269060e-127 Kif20a
2.498557e-49 6.708089 0.211 0.002 5.412125e-45 Kif18b
1.276528e-123 6.698759 0.704 0.022 2.765088e-119 Hmmr
7.323690e-118 6.697604 0.648 0.018 1.586384e-113 Cdkn3
5
8.634885e-96 10.851032 0.359 0.000 1.870402e-91 Gpm6a
1.763092e-101 9.299257 0.625 0.022 3.819034e-97 Igfbp2
6.601087e-60 8.927205 0.266 0.003 1.429861e-55 Rapgef4
1.821008e-49 8.254599 0.266 0.006 3.944485e-45 Rasgrf2
7.170877e-22 8.228981 0.078 0.000 1.553284e-17 Kcnb1
4.339783e-34 8.005524 0.125 0.000 9.400404e-30 Sybu
1.799399e-05 7.852658 0.016 0.000 3.897678e-01 Cntn4
1.799399e-05 7.852658 0.016 0.000 3.897678e-01 Gm34907
5.073373e-117 7.773545 0.703 0.024 1.098943e-112 Spock2
1.022456e-13 7.618520 0.047 0.000 2.214743e-09 Npy
4.924934e-27 7.548615 0.125 0.002 1.066790e-22 Spink4
8.495495e-18 7.367688 0.062 0.000 1.840209e-13 Calcr
9.890234e-19 7.033478 0.109 0.003 2.142323e-14 Lrtm2
3.875216e-23 6.989380 0.109 0.002 8.394106e-19 Stk32c
7.125636e-39 6.934464 0.172 0.002 1.543484e-34 Wnt10b
3.973923e-10 6.801500 0.062 0.003 8.607915e-06 Eya4
2.406016e-10 6.770012 0.047 0.001 5.211671e-06 Igf2bp1
8.435950e-12 6.739909 0.062 0.002 1.827311e-07 Rspo1
1.280910e-09 6.719020 0.031 0.000 2.774580e-05 D030018L15Rik
3.973923e-10 6.688989 0.062 0.003 8.607915e-06 Egflam
6
1.445796e-09 6.253649 0.030 0.000 3.131740e-05 Cacnb2
5.014759e-12 6.000951 0.061 0.001 1.086247e-07 Gm14493
1.445796e-09 5.977239 0.030 0.000 3.131740e-05 Foxc2
1.445796e-09 5.977239 0.030 0.000 3.131740e-05 Gm41318
1.445796e-09 5.949047 0.030 0.000 3.131740e-05 Hemgn
1.445796e-09 5.949047 0.030 0.000 3.131740e-05 Gm9754
1.445796e-09 5.941651 0.030 0.000 3.131740e-05 Gm44037
1.445796e-09 5.920404 0.030 0.000 3.131740e-05 Gm14149
1.445796e-09 5.889054 0.030 0.000 3.131740e-05 Fsd2
1.445796e-09 5.889054 0.030 0.000 3.131740e-05 Phyhipl
1.445796e-09 5.889054 0.030 0.000 3.131740e-05 Gm28379
1.445796e-09 5.889054 0.030 0.000 3.131740e-05 AC130815.3
1.445796e-09 5.866441 0.030 0.000 3.131740e-05 Drd4
5.014759e-12 5.849725 0.061 0.001 1.086247e-07 Pdzd9
5.014759e-12 5.787165 0.061 0.001 1.086247e-07 Gm39228
3.091143e-05 5.718588 0.030 0.001 6.695725e-01 Gm5160
1.445796e-09 5.696732 0.030 0.000 3.131740e-05 Trim10
1.445796e-09 5.693598 0.030 0.000 3.131740e-05 AI849053
1.445796e-09 5.693598 0.030 0.000 3.131740e-05 Epha10
1.445796e-09 5.693598 0.030 0.000 3.131740e-05 Gm6904
1.445796e-09 5.693598 0.030 0.000 3.131740e-05 Vmn2r18
1.445796e-09 5.693598 0.030 0.000 3.131740e-05 AC119264.1
Code
# Create the expression dot plot
gl <- c("Sftpa1", "Sftpc", "Lgi3",
"Lcn2", "Lrg1", "Glrx",
"Top2a", "Mki67", "Cenpf",
"Areg", "Krt8", "Cdkn1a",
"Pdpn", "Hopx", "Cav1")

DotPlot(murine_aec, features = gl, cols = "RdBu") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Code
# Cluster identification based on published data sets
# https://doi.org/10.1038/s41467-020-17358-3
# https://doi.org/10.1038/s41556-020-0542-8
# https://doi.org/10.1016/j.stem.2020.06.020
murine_aec <- RenameIdents(murine_aec,
    `0` = "DATP",
    `1` = "AEC2",
    `2` = "DATP",
    `3` = "pAEC2",
    `4` = "cAEC2",
    `5` = "AEC1",
    `6` = "cAEC2")

murine_aec$aec_type <- Idents(murine_aec)
murine_aec$aec_type <- factor(murine_aec$aec_type,
    levels = c("AEC1", "DATP", "cAEC2", "pAEC2", "AEC2"))

DotPlot(murine_aec,
    features = gl,
    cols = "RdBu",
    group.by = "aec_type",
    dot.scale = 8) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Code
r_dim_plot(murine_aec, split.by = "type")

Code
r_dim_plot(murine_aec, group.by = "sample", shuffle = TRUE)

Code
r_dim_plot(murine_aec, group.by = "type", shuffle = TRUE)

Code
r_dim_plot(murine_aec, group.by = "cell_type", repel = TRUE)

Code
# balbc_k7m2_combined.AEC_1$celltype_sample <-
#     paste(Idents(balbc_k7m2_combined.AEC_1),
#           balbc_k7m2_combined.AEC_1$sample,
#           sep = "_")
# balbc_k7m2_combined.AEC_1$celltype <- Idents(balbc_k7m2_combined.AEC_1)

Run analyses on human stroma cells

Set up automated cell type annotation

Code
lung_ref <- qs::qread("LungRefs/normal_lung.qs")
lung_ref$free_annotation <-
    stringr::str_replace_all(lung_ref$free_annotation,
        c("/" = "_", "\\+" = "_plus"))
lung_ref$free_annotation <-
    stringr::str_replace_all(lung_ref$free_annotation,
        c("Alveolar Epithelial Type 1" = "Alveolar Epithelial",  #nolint
            "Alveolar Epithelial Type 2" = "Alveolar Epithelial",  #nolint
            "Basophil_Mast 1" = "Basophil_Mast",
            "Basophil_Mast 2" = "Basophil_Mast",
            "Bronchial Vessel 1" = "Bronchial Vessel",
            "Bronchial Vessel 2" = "Bronchial Vessel",
            "IGSF21_plus Dendritic" = "Dendritic cells",
            "Myeloid Dendritic Type 1" = "Myeloid Dendritic",   #nolint
            "Myeloid Dendritic Type 2" = "Myeloid Dendritic",   #nolint
            "Natural Killer" = "NK cells",
            "Natural Killer T" = "NKT",
            "NK cells T" = "NKT",
            "CD4_plus Memory/Effector T" = "CD4+ T cells",
            "CD4_plus Naive T" = "CD4+ T cells",
            "CD8_plus Memory_Effector T" = "CD8+ T cells",
            "CD8_plus Naive T" = "CD8+ T cells",
            "Proliferating NK_T" = "NKT",
            "TREM2_plus Dendritic" = "Dendritic cells",
            "Proliferating Macrophage" = "Macrophage",
            "Nonclassical Monocyte" = "Monocytes",
            "Classical Monocyte" = "Monocytes"))

mets_ref <- qs::qread("LungRefs/human_lung_met_ref.qs")

annotate <- function(sobject,
    aggr_ref = TRUE,
    label_type = "label.main",
    ...) {
        # hpca <- celldex::HumanPrimaryCellAtlasData()
        # hpca$label.main <-
        #     stringr::str_replace_all(hpca$label.main,
        #         c("T_cells" = "T cells",
        #         "B_cell" = "B cells",
        #         "NK_cell" = "NK cells",
        #         "Monocyte" = "Monocytes",
        #         "DC" = "Dendritic cells"))
        # huim <- celldex::MonacoImmuneData()
        # ref3 <- GetAssayData(lung_ref)
        ref <- list(mets_ref)
        labels <- list(
            # hpca[[label_type]],
            # huim[[label_type]],
            # lung_ref$free_annotation,
            mets_ref$label)
        annotation <-
            SingleR::SingleR(test = Seurat::as.SingleCellExperiment(sobject),
                ref = ref,
                labels = labels,
                aggr.ref = aggr_ref,
                BPPARAM = BiocParallel::MulticoreParam(
                    parallelly::availableCores()),
                ...)
        sobject$annotations <- annotation$labels
        sobject$cell_scores <-
            apply(X = annotation$scores,
                MARGIN = 1,
                function(x) max(x, na.rm = TRUE))
        return(sobject)
    }

Load the normal human lung data

GSE227136 contains a very large control normal lung dataset. We will extract normal lung scRNA-seq from this.

The download will contain a subset of the normal lung tissue samples, balanced for whole cell and nucleus based RNAseq, from the Human Lung Cell Atlas project.

NOTE: If you haven’t already downloaded and processed, you might need to run this step on a himem node.

Code
# if(!file.exists("hlca/nl_lung.qs")) {
#     if (!file.exists("~/lab/GenRef/sc_ref_datasets/human/hlca.qs")) {
#         options(
#             reticulate.conda_binary = "/gpfs0/export/apps/easybuild/software/Miniconda3/4.9.2/condabin/conda",
#             SCP_env_name = "r")
#         library(SCP)
#         library(reticulate)
#         sc <- import("scanpy")

#         # Download the annotated full dataset from the Human Lung Cell Atlas project
#         # https://doi.org/10.1038/s41591-023-02327-2
#         system(
#             "curl https://datasets.cellxgene.cziscience.com/3ab47484-a3eb-4f6a-beea-670e1a8fc1e8.h5ad --output ~/lab/GenRef/sc_ref_datasets/human/hlca.h5ad"
#         )

#         # Need to invoke BPCells to handle the large object until we simplify
#         hlca.data <- BPCells::open_matrix_anndata_hdf5(
#             path = "~/rxr014/hlca.h5ad"
#         )

#         # Write the matrix to a directory
#         BPCells::write_matrix_dir(mat = hlca.data, dir = "hlca/")

#         # Now that we have the matrix on disk, we can load it
#         hlca.mat <- BPCells::open_matrix_dir(dir = "hlca/")
#         hlca.mat <- Azimuth:::ConvertEnsembleToSymbol(mat = hlca.mat, species = "human")
#         hlca.meta <- Azimuth::LoadH5ADobs("~/rxr014/hlca.h5ad")

#         # Create Seurat object (just metadata and raw counts)
#         hlca <- SeuratObject::CreateSeuratObject(counts = hlca.mat, meta.data = hlca.meta)

#         # Write the simplified Seurat object
#         qs::qsave(hlca, "~/lab/GenRef/sc_ref_datasets/human/hlca.qs")

#     } else {
#         hlca <- qs::qread("~/lab/GenRef/sc_ref_datasets/human/hlca.qs")
#     }

#     cases <- as.data.frame(hlca@meta.data) %>%
#         select(
#             sample,
#             age_or_mean_of_age_range,
#             smoking_status,
#             subject_type,
#             tissue,
#             lung_condition,
#             assay,
#             fresh_or_frozen
#             ) %>%
#         filter(
#             !tissue %in% c("respiratory airway", "nose"),
#             lung_condition == "Healthy",
#             assay == "10x 3' v3",
#             age_or_mean_of_age_range > 1,
#             smoking_status != 0
#             ) %>%
#         unique()

#     table(cases$fresh_or_frozen)

#     nl_lung <- subset(hlca, subset = sample %in% cases$sample)
#     nl_lung$type <- "Healthy"

#     nl_lung <- SplitObject(nl_lung, split.by = "sample")

#     qs::qsave(nl_lung, "hlca/nl_lung.qs")

# } else {
#     nl_lung <- qs::qread("hlca/nl_lung.qs")
# }

nl_lung <- qs::qread("LungRefs/hlca_healthy_sample.qs")
nl_lung$type <- "Healthy"

Load the osteosarcoma lung metastases

Code
if(!file.exists("output/rdata/met_lungs.qs")) {
    mets_meta <- tribble(
        ~file, ~Sample_Name, ~Sample_Source, ~Sample_Type, ~method,
        "~/lab/Counts/S0058/outs/filtered_feature_bc_matrix/", "S0058", "Roberts Lab", "Metastasis", "cell",
        "~/lab/Counts/S0059/outs/filtered_feature_bc_matrix/", "S0059", "Roberts Lab", "Metastasis", "cell",
        "~/lab/Counts/S0217/filtered_feature_bc_matrix", "S0217", "Roberts Lab", "Metastasis", "nucleus",
        "~/lab/Counts/S0218/filtered_feature_bc_matrix", "S0218", "Roberts Lab", "Metastasis", "nucleus",
        "~/lab/Counts/SC069/filtered_feature_bc_matrix/", "SC069", "Roberts Lab", "Metastasis", "nucleus",
        "~/lab/Counts/SC072/filtered_feature_bc_matrix/", "SC072", "Roberts Lab", "Metastasis", "nucleus",
        "~/lab/Counts/SC073/filtered_feature_bc_matrix/", "SC073", "Roberts Lab", "Metastasis", "nucleus")

    met_lung <- mclapply(1:nrow(mets_meta),
        function(m) {
            message(paste("Starting to load", mets_meta$Sample_Name[m], "..."))
            s <- tenx_load_qc(mets_meta$file[m], violin_plot = FALSE) %>%
                NormalizeData()
            s$id <- mets_meta$Sample_Name[m]
            s$src <- mets_meta$Sample_Source[m]
            s$type <- mets_meta$Sample_Type[m]
            s$method <- mets_meta$method[m]
            message(paste(mets_meta$Sample_Name[m], "completed."))
            return(s)
        }, mc.cores = parallelly::availableCores())
    names(met_lung) <- mets_meta$Sample_Name

    # Add samples from the Soragnyi lab (ALSF scPCA)
    # cite https://doi.org/10.1101/2023.05.25.542375
    mets_meta2 <- tribble(
        ~file, ~Sample_Name, ~Sample_Source, ~Sample_Type, ~method,
        "../../../ExternalData/alsf/Soragni/SCPCS000522/SCPCL000854_filtered.rds", "SCPCS000522", "ALSF", "Metastasis", "cell",
        "../../../ExternalData/alsf/Soragni/SCPCS000523/SCPCL000855_filtered.rds", "SCPCS000523", "ALSF", "Metastasis", "cell",
        "../../../ExternalData/alsf/Soragni/SCPCS000524/SCPCL000856_filtered.rds", "SCPCS000524", "ALSF", "Metastasis", "cell",
        "../../../ExternalData/alsf/Soragni/SCPCS000525/SCPCL000857_filtered.rds", "SCPCS000525", "ALSF", "Metastasis", "cell",
        "../../../ExternalData/alsf/Soragni/SCPCS000526/SCPCL000858_filtered.rds", "SCPCS000526", "ALSF", "Metastasis", "cell")

    # Create a function to convert Ensembl IDs to gene symbols
    mapIds_2 <- function(IDs, IDFrom, IDTo) {
        require(org.Hs.eg.db)
        idmap <- mapIds(org.Hs.eg.db,
            keys = IDs,
            column = IDTo,
            keytype = IDFrom,
            multiVals = "first")
        na_vec <- names(idmap[is.na(idmap) == TRUE])
        idmap <- idmap[is.na(idmap) == FALSE]
        idmap_df <- data.frame(
            "From"=names(idmap),
            "To"=unlist(unname(idmap)),
            stringsAsFactors = FALSE)
        return(list(map=idmap_df, noMap=na_vec))
    }

    met_lung2 <- mclapply(1:nrow(mets_meta2),
        function(m) {
            message(paste("Starting to load", mets_meta2$Sample_Name[m], "..."))
            s <- readRDS(mets_meta2$file[m]) %>%
                scater::logNormCounts() %>%
                as.Seurat() %>%
                NormalizeData()
            s$id <- mets_meta2$Sample_Name[m]
            s$src <- mets_meta2$Sample_Source[m]
            s$type <- mets_meta2$Sample_Type[m]
            s$method <- mets_meta2$method[m]
            d <- GetAssayData(s, "originalexp")
            remap <- mapIds_2(rownames(d), "ENSEMBL", "SYMBOL")
            d <- d[remap$map$From, ]
            rownames(d) <- left_join(
                as_tibble(rownames(d)),
                remap[["map"]],
                join_by(value == From)) %>%
                pull(To)
            s[["RNA"]] <- CreateAssayObject(rowsum(d, row.names(d)))
            Seurat::DefaultAssay(s) <- "RNA"
            s[["originalexp"]] <- NULL
            message(paste(mets_meta2$Sample_Name[m], "completed."))
            return(s)
        }, mc.cores = parallelly::availableCores())
    names(met_lung2) <- mets_meta2$Sample_Name

    # Combine the two lists of Seurat objects
    met_lung <- c(met_lung, met_lung2)

    qs::qsave(met_lung, "output/rdata/met_lungs.qs")
} else {
    met_lung <- qs::qread("output/rdata/met_lungs.qs")
}

Find and exclude tumor cells from met samples

NOTE: if tumor cells have not been previously assigned for any of the samples, this script will error out with a message to run that code and create those samples before proceding. ::: {.cell}

Code
if(!file.exists("misc/human_met_labels.qs")) {
    sample_names <- names(met_lung)
    met_lung <- mclapply(seq_along(met_lung), function(s) {
        message(paste("Starting sample", names(met_lung)[s], "..."))
        o <- process_seurat(met_lung[[s]]) %>%
            annotate()
        message(paste0("Completed" , names(met_lung)[s], "."))
        return(o)
    }, mc.cores = parallelly::availableCores())
    names(met_lung) <- sample_names

    met_label <- merge(met_lung[[1]], met_lung[2:length(met_lung)],
        cell.ids <- names(met_lung)) %>%
            JoinLayers() %>%
            NormalizeData() %>%
            ScaleData() %>%
            FindVariableFeatures() %>%
            RunPCA(npcs = 30, verbose = FALSE) %>%
            harmony::RunHarmony(group.by.vars = c("id", "src", "method")) %>%
            RunUMAP(reduction = "harmony", dims = 1:30) %>%
            FindNeighbors(reduction = "harmony", dims = 1:30) %>%
            FindClusters(resolution = 0.5)

    p1 <- r_dim_plot(met_label, group.by = "seurat_clusters")
    p2 <- r_dim_plot(met_label, group.by = "annotations", repel = TRUE)
    p1 + p2

    gt::gt(met_label[[]] %>%
        select(seurat_clusters, annotations) %>%
        group_by(seurat_clusters, annotations) %>%
        tally() %>%
        arrange(seurat_clusters, -n))

    met_label <- RenameIdents(met_label,
        `0` = "Tumor",
        `1` = "Myeloid",
        `2` = "T/NK-cell",
        `3` = "Tumor",
        `4` = "Endothelial",
        `5` = "Myeloid",
        `6` = "Cycling",
        `7` = "Tumor",
        `8` = "Tumor",
        `9` = "Smooth_muscle",
        `10` = "Myeloid",
        `11` = "Epithelial",
        `12` = "B-cell",
        `13` = "Tumor",
        `14` = "Myeloid",
        `15` = "LowQ",
        `16` = "Myeloid",
        `17` = "Myeloid",
        `18` = "Endothelial",
        `19` = "Myeloid",
        `20` = "T/NK-cell",
        `21` = "LowQ")

    r_dim_plot(met_label, "Manual Annotations")
    met_label$met_labels <- Idents(met_label)

    qs::qsave(met_label$met_labels, "misc/human_met_labels.qs")
} else {
    met_lung <- merge(met_lung[[1]], met_lung[2:length(met_lung)],
        cell.ids <- names(met_lung)) %>%
            JoinLayers()
    met_lung <- subset(met_lung, cells = names(qs::qread("misc/human_met_labels.qs")))
    met_lung$met_labels <- qs::qread("misc/human_met_labels.qs")
    met_stroma <- subset(met_lung, met_labels %in% c("Tumor", "LowQ"), invert = TRUE)
}

:::

Combine datasets from normal and met

Need to rewrite to make integration conditional on having been previously run*** ::: {.cell}

Code
# Check to see if integrated dataset has already been processed
if(!file.exists("output/rdata/integrated_lungs.qs")) {
    nl_lung <- subset(nl_lung, tissue == "lung parenchyma")
    nl_lung$id <- nl_lung$sample
    nl_lung$src <- "HLCA"
    nl_lung$method <- nl_lung$fresh_or_frozen
    nl_lung$annotations <- nl_lung$ann_level_2

    met_stroma$annotations <- met_stroma$met_labels

    keep_features <- intersect(rownames(nl_lung), rownames(met_stroma))

    nl_lung <- nl_lung[rownames(nl_lung) %in% keep_features, ]
    met_stroma <- met_stroma[rownames(met_stroma) %in% keep_features, ]

    lungs <- merge(met_stroma, nl_lung) %>%
        JoinLayers() %>%
        NormalizeData() %>%
        ScaleData() %>%
        FindVariableFeatures() %>%
        RunPCA(npcs = 30, verbose = FALSE) %>%
        harmony::RunHarmony(group.by.vars = c("id")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony", dims = 1:30) %>%
        FindClusters(resolution = 0.5)

    qs::qsave(lungs, "output/rdata/integrated_lungs.qs")
} else {
    lungs <- qs::qread("output/rdata/integrated_lungs.qs")
}

p1 <- r_dim_plot(subset(lungs, type == "Healthy"),
    group.by = "cell_type",
    repel = TRUE,
    shuffle = TRUE)
p2 <- r_dim_plot(subset(lungs, type == "Metastasis"),
    group.by = "met_labels",
    repel = TRUE,
    shuffle = TRUE)
p3 <- r_dim_plot(lungs, "Integrated Stroma")
p3 | p1 | p2
Warning: ggrepel: 41 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Code
r_dim_plot(lungs, split.by = "type", group.by = "cell_type", repel = TRUE)
Warning: Removed 41 rows containing missing values or values outside the scale range
(`geom_text_repel()`).
ggrepel: 41 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Code
# Perform a semi-automated, manually verified cell type assignment
# ***This needs to be redone since re-building the healthy lung reference
if(!file.exists("misc/lungs_manual_assignment.qs")) {
    r_dim_plot(lungs, split.by = "type")

    gt::gt(lungs[[]] %>%
        select(seurat_clusters, annotations) %>%
        group_by(seurat_clusters, annotations) %>%
        tally() %>%
        arrange(seurat_clusters, -n))

    lungs <- RenameIdents(lungs,
        `0` = "Myeloid",
        `1` = "Airway epithelial",
        `2` = "Airway epithelial",
        `3` = "Osteosarcoma",
        `4` = "Endothelial",
        `5` = "Lymphoid",
        `6` = "Lymphoid",
        `7` = "Osteosarcoma",
        `8` = "Alveolar epithelial",
        `9` = "Cycling",
        `10` = "Fibroblast",
        `11` = "Smooth muscle",
        `12` = "Airway epithelial",
        `13` = "Smooth muscle",
        `14` = "Myeloid",
        `15` = "Lymphoid",
        `16` = "Osteosarcoma",
        `17` = "Endothelial",
        `18` = "Myeloid",
        `19` = "Myeloid",
        `20` = "Myeloid",
        `21` = "Myeloid",
        `22` = "Osteosarcoma",
        `23` = "Osteosarcoma")

    lungs$manual_assignments <- Idents(lungs)
    Idents(lungs) <- lungs$seurat_clusters

    qs::qsave(lungs$manual_assignments, "misc/lungs_manual_assignment.qs")

} else {
    lungs$manual_assignments <- qs::qread("misc/lungs_manual_assignment.qs")
}

# Clean up the carry over and reprocess
lungs <- subset(lungs, manual_assignments %in% c("Osteosarcoma"), invert = TRUE)

lungs$method[lungs$method == "fresh"] <- "cell"
lungs$method[lungs$method == "frozen"] <- "nucleus"

lungs[["RNA"]]$scale.data <- NULL
lungs[["RNA"]]$scale.data.1 <- NULL

lungs <- DietSeurat(lungs) %>%
    ScaleData() %>%
    FindVariableFeatures() %>%
    RunPCA(npcs = 30, verbose = FALSE) %>%
    harmony::RunHarmony(group.by.vars = c("id")) %>%
    RunUMAP(reduction = "harmony", dims = 1:30) %>%
    FindNeighbors(reduction = "harmony", dims = 1:30) %>%
    FindClusters(resolution = 0.5)
Centering and scaling data matrix
Finding variable features for layer counts
Warning in PrepDR5(object = object, features = features, layer = layer, : The
following features were not available: ANKRD28, ZBTB20, PAPPA, RASSF8, PTCHD4,
DLC1, IRX3, FLNB, CACNA1C, PLCL1, UNC5B, STK39, SYNE1, TTC28, LGALS1, FNBP1L,
LPP, PLS3, MYO6, TTC6, MIR99AHG, ABHD2, ARHGEF12, WWC2, PBX3, TNFRSF19, MEGF10,
MAMDC2-AS1, ACSS3, PAGE2B, CYP7B1, ZEB1, ACAN, TGFBR1, FILIP1L, PPP4R1, VIM,
FNDC3B, HSP90AA1, ANGPTL2, TJP1, LRP1B, SKOR2, TGFB2, PYCR1, NFXL1, TFPI,
STPG2, SPIDR, CNN3, PKIG, PPP1R1C, NUDT4, SMOC1, ANGPTL1, RRM2, RGS3, CEP112,
ANO3, MEOX2, GALNT2, PDGFRL, PFKP, CELSR1, FBXL2, PTMS, ASPN, CLRN1-AS1, AFAP1,
ITGA8, HMGN2, MIR646HG, CCNB1, GABBR2, RND3, ARNT2, LINC00845, KIAA0825, CDKN3,
TIPARP, CASC2, CPNE4, UBE2H, L3MBTL4, KCNG1, KCNJ2, RCN1, PTGIS, LINC01436,
DEPTOR, VCL, BAG3, EHD2, ITGB1, SPNS2, TMEM232, RHOJ, TSPAN18, RGS6, PPA1,
RNU1-73P, NR5A2, ZFP42, LINC00607, MTATP8P2, PTPN21, LINC00922, FOXC2, RRAS,
RAD51B, MBNL1, LDLRAD4, FERMT2, SLC14A2-AS1, FTX, FXYD1, DKK3, FJX1, EHBP1,
HPCAL1, CAPS, CACHD1, SLIT2-IT1, EXOC6B, MYADM, IGF1R, UST, TYR, GULP1, LGR5,
TMTC1, FLRT2, PLSCR4, MBD5, ZDHHC21, SH2D4A, ITGA3, EPB41L4B, RPL10AP2, ACTL8,
SSTR4, PLEKHA5, RHOBTB2, MACROD2-AS1, BMP4, GCNT2, AOC3, FAM133FP, SLC6A4,
FADS3, GSTP1, CPEB1-AS1, SLC44A3, PICSAR, FAM133A, PLLP, NOSTRIN, CAMKMT, PALM,
SLC22A9, TEX41, ELOVL6, TXN, PTPRE, DYSF, CDC42EP2, FARP1, GFPT2, GJB2, STK38L,
TLL1, BMPR1A, CDK1, ARHGAP32, DSG2-AS1, CASKIN2, CACNA1C-AS4, VSTM2A, LOXL2,
EXPH5, COL28A1, RHOXF2, RN7SL141P, DUOX1, SMYD1, HPGD, TMEM70, CUX2, LINC00921,
SVIL, PLA2G4E, PAGE5, DUSP26, IQCA1, TMEM204, ASAP2, PPP4R4, PLA2G5, CRTAM,
LINC01572, ARHGAP12, RGS21, LRRIQ1, LINC03053, AHCYL2, ZNF605, BOLA3P4, HDC,
CHRM1, CA10, ASS1, MBNL2, IL17F, GABRB2, RAB27A, LINC02855, POLQ, MMP12,
KRTAP12-3, NDFIP2, ATP1A1, PLCE1, ZNF608, LAMP5, TNFRSF13B, CENPM, ARSI,
LRRC26, ANLN, ATP1A2, PDLIM4, LRRC4C, RAPGEF3, SECISBP2L, PKP2, NRP1, UFL1-AS1,
ZNF486, FAP, LBH, HSPB6, NME1, DOCK1, TRBV13, EPHB4, TBX10, RPGR, MAML3, KCNK3,
ALLC, RN7SL405P, HMGN1P13, LINC00840, KANK1, SCN7A, DMRTC2, BCL6B, CRIM1, DKK1,
IGKV2D-29, TMEM54, SERPINB10, DLGAP5, SNX24, IL36B, PPIC, PGM2L1, CCN5, PNPLA6,
NOS3, EGLN3, CHRNA7, CORIN, ADARB2, DHH, RN7SK, SPTB.
Transposing data matrix
Initializing state using k-means centroids initialization
Harmony 1/10
Harmony 2/10
Harmony 3/10
Harmony 4/10
Harmony 5/10
Harmony 6/10
Harmony 7/10
Harmony converged after 7 iterations
19:14:26 UMAP embedding parameters a = 0.9922 b = 1.112
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
19:14:26 Read 62808 rows and found 30 numeric columns
19:14:26 Using Annoy for neighbor search, n_neighbors = 30
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
19:14:26 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
19:14:35 Writing NN index file to temp file /gpfs0/scratch/5855649/RtmpfZhFWt/file1fad3078113a85
19:14:35 Searching Annoy index using 1 thread, search_k = 3000
19:15:02 Annoy recall = 100%
19:15:03 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
19:15:07 Initializing from normalized Laplacian + noise (using RSpectra)
19:15:13 Commencing optimization for 200 epochs, with 2889128 positive edges
19:15:41 Optimization finished
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 62808
Number of edges: 2342897

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9406
Number of communities: 23
Elapsed time: 20 seconds
Code
p1 <- r_dim_plot(lungs,
    group.by = "seurat_clusters",
    repel = TRUE,
    label.box = TRUE)
p2 <- r_dim_plot(subset(lungs, type == "Healthy"),
    group.by = "ann_level_3",
    repel = TRUE,
    label.box = TRUE)
p3 <- r_dim_plot(subset(lungs, type == "Metastasis"),
    group.by = "met_labels",
    repel = TRUE)
p4 <- r_dim_plot(lungs,
    group.by = "manual_assignments",
    repel = TRUE,
    label.box = TRUE)
(p1 + p2) / (p4 + p3)
Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Warning: ggrepel: 23 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Code
# Refine the cell type calls so that the final figure looks pretty
if(!file.exists("misc/human_mets_stroma_assignments.qs")) {
    lungs <- kill_cc(lungs)
    Idents(lungs) <- lungs$seurat_clusters

    lungs_marks_2 <- FindAllMarkers(lungs)

    gt::gt(lungs_marks_2 %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs by cluster")

    lungs <- RenameIdents(lungs,
        `0` = "Myeloid",
        `1` = "T/NK Cell",
        `2` = "Endothelial",
        `3` = "T/NK Cell",
        `4` = "Epithelial",
        `5` = "T/NK Cell",
        `6` = "Smooth Muscle", # Possible entanglement of tumor cells
        `7` = "Fibroblast", # Possible entaglement of tumor cells
        `8` = "Myeloid",
        `9` = "Tumor", # Cycling heavily, otherwise not matching
        `10` = "Tumor", # Cycling heavily, otherwise not matching
        `11` = "B Cell",
        `12` = "Myeloid",
        `13` = "Myeloid",
        `14` = "Tumor", # Matches OS markers very clearly
        `15` = "Endothelial",
        `16` = "Epithelial",
        `17` = "Endothelial",
        `18` = "Myeloid",
        `19` = "Epithelial",
        `20` = "Myeloid",
        `21` = "Myeloid",
        `22` = "Tumor") # Cycling heavily, otherwise not matching

    qs::qsave(Idents(lungs), "misc/human_mets_stroma_assignments.qs")
    lungs$manual_assignments_v2 <- Idents(lungs)
    Idents(lungs) <- lungs$seurat_clusters
} else {
    lungs$manual_assignments_v2 <- qs::qread("misc/human_mets_stroma_assignments.qs")
}

lungs <- subset(lungs, manual_assignments_v2 != "Tumor")

# Filter to even sample size for presentation
h_cells <- Cells(lungs)[which(lungs$type == "Healthy")]
m_cells <- Cells(lungs)[which(lungs$type == "Metastasis")] %>%
    sample(size = length(h_cells))
lungs_2 <- subset(lungs, cells = c(h_cells, m_cells))

r_dim_plot(lungs_2, group.by = "manual_assignments_v2", split.by = "type")

:::

Subcluster epithelial cells

Code
if(!file.exists("output/rdata/human_epithelial.qs")) {
    epi <- subset(lungs, manual_assignments_v2 == "Epithelial")

    # Reprocess the subsetted cells
    epi <- DietSeurat(epi, assays = "RNA") %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 20, verbose = FALSE) %>%
        harmony::RunHarmony(
            group.by.vars = c("id", "method"), lambda = c(1, 0.5)) %>%
        RunUMAP(reduction = "harmony", dims = 1:20) %>%
        FindNeighbors(reduction = "harmony", dims = 1:20) %>%
        FindClusters(resolution = 0.3)

    r_dim_plot(epi, split.by = "type")

    p1 <- r_dim_plot(subset(epi, type == "Healthy"),
        group.by = "cell_type",
        repel = TRUE,
        shuffle = TRUE)
    p2 <- r_dim_plot(subset(epi, type == "Metastasis"),
        group.by = "annotations",
        repel = TRUE,
        shuffle = TRUE)
    p3 <- r_dim_plot(epi, "Integrated epithelial")
    p3 | p1 | p2

    epi_marks <- FindAllMarkers(epi)

    gt::gt(epi_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human AECs (by cluster)")

    epi <- subset(epi,
        idents = c(4, # Clearly a cluster of ciliated epithelium
            6, # A cluster of leak-through goblet cells
            7), # A cluster of leak-through myeloid cells
        invert = TRUE)

    # Reprocess the subsetted cells
    epi <- DietSeurat(epi, assays = "RNA") %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 20, verbose = FALSE) %>%
        harmony::RunHarmony(group.by.vars = c("type", "id", "method")) %>%
        RunUMAP(reduction = "harmony", dims = 1:20) %>%
        FindNeighbors(reduction = "umap", dims = 1:2) %>%
        FindClusters(resolution = 0.5)

    r_dim_plot(epi, split.by = "type")

    # Perform a global assessment of DEG
    epi_marks <- FindAllMarkers(epi)

    gt::gt(epi_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human AECs (by cluster)")

    # Clean out clear osteosarcoma contamination
    # (top genes SATB2, COL1A1, RUNX2)
    epi <- subset(epi, idents = c(13, 15, 16, 17, 18, 19), invert = TRUE)

    # Reprocess
    epi <- DietSeurat(epi, assays = "RNA") %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 20, verbose = FALSE) %>%
        harmony::RunHarmony(group.by.vars = c("type", "id", "method")) %>%
        RunUMAP(reduction = "harmony", dims = 1:20) %>%
        FindNeighbors(reduction = "umap", dims = 1:2) %>%
        FindClusters(resolution = 0.5)

    r_dim_plot(epi, split.by = "type")

    epi <- run_fdl(epi)
    r_dim_plot(epi, reduction = "fdl", split.by = "type")

    qs::qsave(epi, "output/rdata/human_epithelial.qs")
} else {
    epi <- qs::qread("output/rdata/human_epithelial.qs")
}
# Perform a global assessment of DEG
epi_marks <- FindAllMarkers(epi)
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Calculating cluster 6
Calculating cluster 7
Calculating cluster 8
Calculating cluster 9
Calculating cluster 10
Calculating cluster 11
Calculating cluster 12
Calculating cluster 13
Code
gt::gt(epi_marks %>%
    group_by(cluster) %>%
    filter(pct.1 > 0.5) %>%
    slice_max(avg_log2FC, n = 20) %>%
    arrange(cluster, -avg_log2FC)) %>%
    gt::tab_header("DEGs in human AECs (by cluster)")
DEGs in human AECs (by cluster)
p_val avg_log2FC pct.1 pct.2 p_val_adj gene
0
0.000000e+00 5.1170103 0.745 0.044 0.000000e+00 BICC1
0.000000e+00 4.6881351 0.665 0.072 0.000000e+00 CP
0.000000e+00 4.3959742 0.801 0.064 0.000000e+00 CASC15
1.291445e-274 4.3837703 0.539 0.043 4.240590e-270 CREB5
0.000000e+00 4.3110779 0.589 0.040 0.000000e+00 INPP4B
4.108256e-298 4.1905716 0.509 0.027 1.348987e-293 FHOD3
1.069847e-306 4.1811778 0.550 0.035 3.512951e-302 MYO3B
0.000000e+00 4.1566604 0.911 0.084 0.000000e+00 LINC00511
0.000000e+00 3.9690891 0.667 0.048 0.000000e+00 EPHB2
2.944030e-299 3.9213681 0.567 0.042 9.667016e-295 PCSK6
0.000000e+00 3.9044732 0.759 0.064 0.000000e+00 FRMD5
0.000000e+00 3.8770462 0.637 0.054 0.000000e+00 DPYSL3
3.277019e-291 3.8460479 0.548 0.039 1.076042e-286 POT1-AS1
0.000000e+00 3.8301625 0.782 0.082 0.000000e+00 SLCO2A1
0.000000e+00 3.8117128 0.706 0.070 0.000000e+00 SEMA3C
0.000000e+00 3.8082208 0.952 0.188 0.000000e+00 EFNA5
0.000000e+00 3.7397062 0.734 0.069 0.000000e+00 DOCK3
0.000000e+00 3.7266213 0.652 0.053 0.000000e+00 PCED1B
0.000000e+00 3.6841275 0.830 0.100 0.000000e+00 GPC6
3.509233e-297 3.6621984 0.585 0.047 1.152292e-292 DENND1A
1
1.207570e-46 1.2495914 0.587 0.325 3.965176e-42 HLA-DRB5
2.976506e-52 1.2387294 0.636 0.347 9.773655e-48 MID1IP1
6.883582e-66 1.2344273 0.741 0.435 2.260293e-61 AQP3
6.689727e-94 1.1765112 0.878 0.479 2.196639e-89 FABP5
2.501355e-46 1.1708764 0.610 0.333 8.213449e-42 PLA2G1B
1.120570e-58 1.1532912 0.704 0.407 3.679503e-54 DCXR
4.651349e-44 1.1424419 0.600 0.342 1.527317e-39 PCBD1
2.383977e-36 1.1353646 0.537 0.300 7.828028e-32 HHIP
7.976047e-43 1.1303543 0.507 0.240 2.619015e-38 SCGB3A1
2.477269e-96 1.1207049 0.886 0.502 8.134361e-92 DBI
2.177226e-45 1.1049363 0.644 0.385 7.149138e-41 C12orf57
3.782909e-68 1.0908828 0.786 0.481 1.242156e-63 LDHB
5.757391e-32 1.0829688 0.533 0.306 1.890497e-27 SOD3
3.691710e-31 1.0811479 0.542 0.332 1.212210e-26 GABARAP
1.434213e-33 1.0494796 0.540 0.322 4.709381e-29 SELENOH
1.033020e-27 1.0484909 0.516 0.329 3.392024e-23 MRPL57
2.893195e-69 1.0401882 0.822 0.488 9.500095e-65 CXCL17
9.971779e-37 1.0400133 0.587 0.350 3.274333e-32 COX14
9.833969e-36 1.0375863 0.553 0.309 3.229082e-31 TCIM
1.008278e-41 1.0353065 0.662 0.410 3.310783e-37 IFITM2
2
6.707272e-67 2.1903322 1.000 0.640 2.202400e-62 SFTPC
4.557416e-84 1.6271379 0.996 0.577 1.496473e-79 SFTPA2
1.100603e-79 1.5601721 0.772 0.347 3.613939e-75 TTN
6.584936e-77 1.5502378 0.998 0.577 2.162230e-72 SFTPA1
9.347067e-53 1.3248560 0.645 0.305 3.069203e-48 STEAP4
2.677796e-49 1.3097735 0.580 0.268 8.792811e-45 FMO5
3.777143e-62 1.2170789 0.645 0.265 1.240263e-57 IRX3
1.557725e-38 1.1168855 0.551 0.271 5.114947e-34 CCDC141
7.639574e-44 1.1122827 0.564 0.252 2.508530e-39 CA2
3.472971e-70 1.1078130 0.919 0.591 1.140385e-65 PGC
1.826521e-45 1.1006913 0.570 0.253 5.997565e-41 ALOX15B
1.882275e-59 1.0941822 0.768 0.375 6.180638e-55 WIF1
4.899598e-66 1.0816789 0.795 0.366 1.608832e-61 SCD
4.070074e-47 1.0408960 0.643 0.289 1.336450e-42 HHIP
5.672046e-39 1.0362936 0.729 0.455 1.862473e-34 SECISBP2L
3.318475e-37 1.0318516 0.582 0.304 1.089655e-32 SOCS3
1.908291e-50 1.0296556 0.685 0.328 6.266063e-46 PLA2G1B
2.131024e-35 0.9766939 0.528 0.246 6.997429e-31 TMEM97
3.452655e-61 0.9725905 0.862 0.447 1.133714e-56 BTG1
2.164306e-46 0.9675648 0.766 0.423 7.106715e-42 MFSD2A
3
2.804195e-05 0.7791976 0.581 0.599 9.207854e-01 WFDC2
1.237598e-06 0.7575266 0.509 0.468 4.063777e-02 RPL17
1.049138e-21 0.6586823 0.768 0.672 3.444951e-17 RPL7A
3.447825e-21 0.6528076 0.749 0.680 1.132128e-16 RPL32
3.348050e-19 0.6492451 0.720 0.655 1.099366e-14 RPL26
5.747695e-09 0.6391256 0.637 0.591 1.887313e-04 RPL4
1.436936e-17 0.6339886 0.720 0.647 4.718324e-13 RPL29
3.333933e-20 0.6149462 0.744 0.685 1.094730e-15 RPS27A
5.671261e-21 0.5987045 0.821 0.732 1.862215e-16 RPL13
1.143554e-15 0.5951144 0.704 0.673 3.754974e-11 RPL14
3.657146e-03 0.5940889 0.504 0.511 1.000000e+00 CFL1
1.584684e-12 0.5939501 0.715 0.692 5.203467e-08 TMSB10
8.140235e-16 0.5930412 0.731 0.660 2.672928e-11 RPS4X
2.354166e-13 0.5927422 0.744 0.693 7.730141e-09 RPL13A
4.670609e-14 0.5876914 0.701 0.655 1.533641e-09 RPL15
1.241471e-11 0.5805256 0.739 0.713 4.076494e-07 CYB5A
1.844329e-17 0.5779844 0.741 0.676 6.056037e-13 RPL30
2.649213e-18 0.5758104 0.771 0.715 8.698957e-14 RPS8
1.284944e-12 0.5751534 0.717 0.657 4.219242e-08 RPL35
4.038538e-10 0.5751425 0.659 0.615 1.326094e-05 UBA52
4
1.690620e-60 1.2475792 0.801 0.504 5.551320e-56 FABP5
1.222642e-30 1.2455989 0.576 0.330 4.014667e-26 SCGB3A2
2.493187e-19 1.1600030 0.517 0.354 8.186630e-15 PLA2G1B
1.158222e-93 1.1452023 0.963 0.616 3.803136e-89 SLPI
8.655023e-24 1.1306839 0.579 0.399 2.841963e-19 SCD
1.259612e-32 1.0766268 0.680 0.478 4.136063e-28 SEC61G
3.586004e-86 1.0533903 0.935 0.636 1.177500e-81 RPL39
1.389120e-16 1.0415530 0.539 0.426 4.561314e-12 FASN
4.415253e-39 1.0350073 0.739 0.495 1.449792e-34 ATP5MC2
1.069487e-13 1.0340883 0.508 0.386 3.511766e-09 MT-ATP8
2.934346e-34 1.0330523 0.697 0.503 9.635219e-30 LDHB
7.353902e-17 1.0298784 0.548 0.444 2.414727e-12 NAA38
1.234057e-59 1.0227205 0.899 0.618 4.052149e-55 RPS26
2.598172e-77 1.0187922 0.921 0.633 8.531358e-73 RPS13
1.312757e-78 1.0143249 0.961 0.630 4.310567e-74 NPC2
2.628108e-33 1.0047495 0.728 0.512 8.629654e-29 CXCL17
7.703245e-45 1.0037711 0.781 0.553 2.529438e-40 TMA7
3.624414e-81 1.0007957 0.935 0.650 1.190113e-76 RPS15A
2.532891e-21 0.9993381 0.587 0.431 8.317000e-17 DCXR
1.194967e-49 0.9988806 0.829 0.563 3.923794e-45 RPS10
5
2.082299e-280 4.4447485 0.921 0.163 6.837438e-276 AGBL1
1.662430e-239 3.7098709 0.933 0.211 5.458754e-235 DLG2
4.162412e-150 3.6436417 0.630 0.118 1.366770e-145 LONRF3
5.737684e-181 3.3476701 0.848 0.213 1.884026e-176 ERBB4
8.162918e-185 3.1299869 0.787 0.162 2.680376e-180 SOX5
9.946422e-168 3.1183704 0.950 0.378 3.266007e-163 PTPRG
4.305819e-119 3.0680120 0.501 0.086 1.413859e-114 TMEM108
4.517383e-135 3.0178365 0.729 0.198 1.483328e-130 C8orf34
1.241497e-148 2.9004203 0.691 0.145 4.076578e-144 UST
4.464814e-104 2.8905018 0.542 0.124 1.466066e-99 CHST11
1.097901e-101 2.8379360 0.764 0.269 3.605067e-97 COL8A1
1.012033e-114 2.7329877 0.953 0.574 3.323113e-110 MACROD2
5.514286e-120 2.7206329 0.714 0.199 1.810671e-115 TOX
4.494065e-141 2.6898641 0.985 0.545 1.475671e-136 PDE4D
3.633708e-139 2.6854358 0.866 0.280 1.193164e-134 TMTC2
7.130488e-150 2.6432505 0.901 0.318 2.341367e-145 AUTS2
2.642012e-152 2.6368312 0.767 0.177 8.675312e-148 PRKG1
1.122057e-142 2.6336267 0.936 0.328 3.684388e-138 PELI1
6.540682e-159 2.6333491 0.848 0.237 2.147698e-154 GPHN
1.362246e-111 2.6306263 0.601 0.141 4.473072e-107 THRB
6
1.206702e-24 1.5517229 0.514 0.286 3.962328e-20 FMO5
1.380196e-26 1.4160449 0.634 0.375 4.532011e-22 TTN
4.951849e-11 1.1950473 0.960 0.826 1.625989e-06 MT-CO1
1.030682e-11 1.1839773 0.880 0.720 3.384349e-07 MT-ND1
1.862960e-05 1.1686287 0.908 0.812 6.117216e-01 MT-CO3
1.990273e-07 1.1372820 0.815 0.730 6.535262e-03 MT-ND3
3.218767e-32 1.1356320 0.877 0.626 1.056914e-27 LRRK2
3.100959e-07 1.0324751 0.895 0.796 1.018231e-02 MT-ATP6
1.826236e-08 1.0193620 0.886 0.765 5.996627e-04 MT-ND4
3.218097e-11 0.9871169 0.852 0.711 1.056694e-06 MT-ND2
9.131232e-07 0.9037143 0.911 0.779 2.998331e-02 MT-CYB
4.821076e-22 0.8993584 0.674 0.444 1.583048e-17 MFSD2A
1.138184e-16 0.8827187 0.582 0.395 3.737340e-12 CSF3R
4.438872e-10 0.8744253 0.800 0.661 1.457548e-05 MT-ND5
2.172993e-11 0.8009320 1.000 0.957 7.135240e-07 SFTPB
1.401623e-12 0.7963049 0.612 0.475 4.602369e-08 SECISBP2L
9.101127e-05 0.7833722 0.908 0.810 1.000000e+00 MT-CO2
1.818764e-17 0.7826578 0.662 0.495 5.972093e-13 DUOX1
1.850105e-05 0.7793888 0.545 0.529 6.075006e-01 ARGLU1
8.366686e-15 0.7532236 0.655 0.456 2.747285e-10 SDR16C5
7
1.195379e-294 4.5354836 0.763 0.064 3.925145e-290 SPOCK2
6.002263e-192 4.1572329 0.521 0.041 1.970903e-187 MS4A15
1.430935e-210 3.9075489 0.770 0.108 4.698618e-206 RTKN2
1.580932e-151 3.7664169 0.525 0.060 5.191149e-147 WFS1
1.119974e-93 3.2999056 0.518 0.104 3.677545e-89 SEMA3B
2.715285e-180 3.0420211 0.926 0.221 8.915911e-176 AGER
1.178053e-121 3.0258160 0.658 0.135 3.868254e-117 RGCC
9.135466e-117 2.8872368 0.735 0.193 2.999722e-112 MYL9
8.172254e-100 2.8843446 0.560 0.112 2.683441e-95 IL32
1.258485e-73 2.8789457 0.630 0.209 4.132361e-69 DENND3
1.999262e-164 2.8736634 0.949 0.279 6.564778e-160 CAV1
3.290813e-129 2.7703686 0.755 0.179 1.080571e-124 CEACAM6
3.117320e-72 2.6984299 0.673 0.253 1.023603e-67 ANKRD29
2.905537e-108 2.6431437 0.755 0.225 9.540620e-104 CAV2
1.090006e-77 2.6331590 0.615 0.186 3.579143e-73 GAS6
5.335611e-73 2.5592487 0.580 0.167 1.752001e-68 ANOS1
1.635819e-90 2.4872971 0.720 0.234 5.371376e-86 ANXA3
5.402183e-117 2.4256813 0.984 0.653 1.773861e-112 EMP2
3.980364e-68 2.4055974 0.521 0.136 1.306992e-63 COL4A2
1.152081e-67 2.3636069 0.615 0.203 3.782973e-63 IGFBP7
8
8.342544e-95 2.6934267 0.601 0.130 2.739358e-90 CHI3L1
4.564474e-104 2.3041357 0.702 0.164 1.498791e-99 S100A9
4.242133e-76 2.1673166 0.621 0.167 1.392947e-71 CHI3L2
2.177025e-56 1.9932029 0.532 0.154 7.148480e-52 SDF2L1
4.322393e-89 1.8631792 0.681 0.163 1.419301e-84 DMBT1
5.130194e-47 1.7060392 0.528 0.163 1.684551e-42 RASGRF1
3.578684e-60 1.5301571 0.738 0.257 1.175097e-55 NRGN
1.995853e-42 1.5287664 0.536 0.180 6.553582e-38 MTHFS
2.342127e-41 1.5217213 0.520 0.173 7.690610e-37 GGCT
3.451101e-54 1.4664916 0.738 0.301 1.133203e-49 MYDGF
2.990012e-57 1.4239398 0.810 0.362 9.818002e-53 AGR2
5.127171e-48 1.3640568 0.730 0.282 1.683558e-43 C11orf96
1.978064e-36 1.3556347 0.661 0.319 6.495172e-32 NNMT
5.300245e-43 1.3176684 0.698 0.304 1.740389e-38 EIF5A
5.853503e-39 1.3141952 0.613 0.245 1.922056e-34 RPL22L1
1.108434e-52 1.3124359 0.806 0.330 3.639653e-48 HLA-DRB5
2.870354e-61 1.3061367 0.899 0.413 9.425095e-57 IFITM2
1.571366e-49 1.2961476 0.762 0.302 5.159736e-45 HHIP
7.769782e-29 1.2803372 0.512 0.213 2.551286e-24 CHCHD7
1.700586e-31 1.2400203 0.524 0.207 5.584045e-27 RANBP1
9
1.257827e-33 1.3344064 0.538 0.180 4.130201e-29 PLIN2
1.101544e-51 1.2965619 0.586 0.154 3.617028e-47 HES1
1.940198e-53 1.2249173 0.690 0.188 6.370834e-49 LPL
2.055738e-39 1.1921728 0.624 0.205 6.750222e-35 SNHG25
2.191313e-44 1.1835296 0.790 0.283 7.195397e-40 C11orf96
1.489149e-39 1.1486309 0.514 0.145 4.889771e-35 VKORC1
3.928510e-43 1.1146934 0.562 0.155 1.289966e-38 TSPYL1
5.005159e-30 1.0885701 0.524 0.178 1.643494e-25 ANP32E
4.426380e-38 1.0768456 0.590 0.186 1.453446e-33 AKAP1
9.369834e-34 1.0654869 0.633 0.230 3.076679e-29 NORAD
9.276194e-35 1.0589003 0.538 0.166 3.045931e-30 FOXA2
1.246505e-33 1.0566856 0.600 0.207 4.093023e-29 PFKFB2
8.957686e-35 1.0563709 0.586 0.194 2.941346e-30 FHDC1
5.705058e-42 1.0476919 0.595 0.172 1.873313e-37 SLCO4C1
6.607910e-34 1.0471755 0.586 0.195 2.169773e-29 SLC27A3
3.444547e-34 1.0455022 0.571 0.183 1.131052e-29 SMIM30
2.109923e-31 1.0386204 0.567 0.189 6.928144e-27 PRXL2A
8.169328e-40 1.0301554 0.724 0.249 2.682481e-35 CRTAC1
1.282120e-31 1.0152358 0.657 0.246 4.209968e-27 PEG10
3.203776e-35 1.0116746 0.614 0.204 1.051992e-30 CTSS
10
8.079717e-113 3.1722255 0.828 0.235 2.653056e-108 AGER
6.838232e-89 2.8672017 0.803 0.295 2.245402e-84 CAV1
2.825151e-56 2.7016936 0.591 0.195 9.276666e-52 CEACAM6
7.874909e-74 2.1620801 0.916 0.661 2.585805e-69 EMP2
2.263071e-26 1.9466731 0.631 0.425 7.431019e-22 CYP4B1
1.179675e-14 1.8685140 0.502 0.373 3.873580e-10 ADIRF
3.126721e-16 1.7573801 0.562 0.430 1.026690e-11 CD151
1.031429e-14 1.6867294 0.502 0.352 3.386799e-10 TACSTD2
5.780738e-35 1.6809621 0.754 0.578 1.898163e-30 KRT7
3.661139e-21 1.5607886 0.665 0.577 1.202172e-16 KRT19
8.806570e-40 1.5569202 0.833 0.740 2.891725e-35 S100A10
1.163311e-08 1.3968026 0.537 0.498 3.819849e-04 TAGLN2
1.944392e-13 1.3320581 0.606 0.560 6.384606e-09 MYL12A
5.048841e-12 1.3035768 0.571 0.503 1.657838e-07 PRDX1
1.121823e-11 1.2766538 0.581 0.479 3.683618e-07 CLDN18
1.182714e-32 1.2556341 0.818 0.683 3.883559e-28 MYL6
2.768179e-28 1.1757299 0.783 0.690 9.089592e-24 TMSB10
7.442572e-40 1.1543519 0.901 0.758 2.443843e-35 ACTB
3.086415e-38 1.1305202 0.911 0.789 1.013455e-33 B2M
4.409104e-23 1.1267497 0.783 0.673 1.447773e-18 ANXA2
11
6.291372e-244 4.9473949 0.950 0.114 2.065835e-239 NCKAP5
3.356180e-144 4.3907640 0.508 0.044 1.102035e-139 MYO16-AS1
7.085020e-175 4.3513064 0.895 0.148 2.326437e-170 KHDRBS2
5.897070e-107 3.8996337 0.541 0.073 1.936362e-102 NYAP2
4.519884e-144 3.8628244 0.602 0.065 1.484149e-139 LINC01290
1.436693e-103 3.6678052 0.558 0.080 4.717524e-99 CCDC85A
8.836653e-136 3.6452622 0.785 0.138 2.901603e-131 GRK5
1.670935e-84 3.5316192 0.724 0.199 5.486683e-80 NTM
1.035770e-146 3.5105105 0.890 0.169 3.401054e-142 GPC5
4.931245e-90 3.4902798 0.552 0.090 1.619223e-85 COL4A1
7.519536e-104 3.3323266 0.746 0.157 2.469115e-99 GPM6A
2.729230e-101 3.3224401 0.608 0.098 8.961701e-97 SEMA5A
4.887766e-108 3.3176054 0.840 0.213 1.604947e-103 SCEL
2.201313e-90 3.2986635 0.624 0.121 7.228231e-86 WASF3
9.710719e-100 3.2839296 0.530 0.073 3.188612e-95 CPA6
2.734700e-117 3.2781967 0.823 0.182 8.979660e-113 MAP2
4.672942e-124 3.2520024 0.878 0.206 1.534407e-119 CLIC5
7.443555e-77 3.2448170 0.552 0.105 2.444166e-72 DCLK1
2.049661e-79 3.1762824 0.541 0.097 6.730266e-75 PLCL1
9.582662e-89 3.1595810 0.663 0.137 3.146563e-84 COL4A2
12
2.661378e-139 4.4171683 0.525 0.039 8.738901e-135 S100A2
2.611996e-165 4.4150431 0.590 0.042 8.576751e-161 MARCKS
7.095507e-143 4.1187039 0.791 0.108 2.329881e-138 KRT17
4.004631e-120 3.6984603 0.511 0.044 1.314961e-115 RPL17-C18orf32
4.739198e-99 3.6134351 0.619 0.087 1.556163e-94 RND3
6.579333e-133 3.5561033 0.547 0.044 2.160390e-128 H3-3B
1.195618e-114 3.4273945 0.504 0.044 3.925930e-110 OOEP
7.536017e-72 3.2583082 0.612 0.120 2.474527e-67 MMP7
5.481018e-76 3.2201538 0.547 0.087 1.799747e-71 GADD45A
1.381342e-69 3.1219420 0.547 0.090 4.535775e-65 TNC
1.323852e-45 2.7311382 0.561 0.144 4.347000e-41 CALD1
2.470784e-64 2.6737053 0.604 0.115 8.113068e-60 DPYSL3
9.223076e-60 2.6476521 0.647 0.156 3.028489e-55 COL1A1
8.921421e-72 2.5755137 0.511 0.076 2.929438e-67 FBXO32
1.356061e-68 2.4349076 0.784 0.209 4.452763e-64 IGFBP7
7.127402e-48 2.4325463 0.504 0.110 2.340354e-43 PPP1R14B
1.096133e-63 2.3685233 0.547 0.094 3.599264e-59 CREB5
1.409268e-55 2.3620489 0.590 0.128 4.627471e-51 EDN1
3.461057e-74 2.3198046 0.741 0.155 1.136473e-69 THBS1
5.392749e-56 2.3137530 0.755 0.225 1.770763e-51 SOX4
13
3.446515e-263 7.2773792 0.621 0.005 1.131698e-258 UBE2C
1.157610e-197 7.0487140 0.517 0.005 3.801127e-193 NUF2
4.816593e-162 6.6498172 0.759 0.017 1.581577e-157 NUSAP1
4.811538e-160 6.6008607 0.621 0.011 1.579917e-155 CENPK
1.671170e-154 6.3751794 0.586 0.010 5.487454e-150 TACC3
8.822374e-173 6.3618268 0.759 0.016 2.896915e-168 TOP2A
1.042861e-207 6.3277698 0.655 0.009 3.424337e-203 ANLN
2.914996e-179 6.2927648 0.621 0.009 9.571681e-175 CENPF
7.413916e-169 6.1875281 0.517 0.006 2.434434e-164 BIRC5
1.291621e-152 6.1700306 0.690 0.015 4.241168e-148 TYMS
5.666724e-131 6.1282939 0.621 0.014 1.860725e-126 CDK1
3.570839e-225 5.8983532 0.690 0.009 1.172521e-220 PRC1
9.229436e-149 5.8039330 0.552 0.009 3.030578e-144 MKI67
1.229035e-109 5.7557903 0.586 0.016 4.035660e-105 PCLAF
5.884541e-84 5.5394675 0.517 0.016 1.932248e-79 C21orf58
9.064692e-99 5.3635212 0.586 0.018 2.976482e-94 CENPW
2.112017e-70 4.9197974 0.552 0.023 6.935018e-66 TK1
9.112436e-66 4.9119524 0.552 0.025 2.992160e-61 UBE2T
4.382916e-62 4.5799548 0.517 0.023 1.439174e-57 BRCA1
1.625092e-49 4.3601934 0.517 0.029 5.336154e-45 KIF20B
Code
# Show genes from the basaloid signature identified in:
# https://doi.org/10.1038/s41467-022-33193-0
# https://doi.org/10.1038/s41467-020-17358-3
# https://doi.org/10.1126/sciadv.aba1983
r_feature_plot(epi, "FOSL1", split.by = "type")

Code
r_feature_plot(epi, "CD24", split.by = "type")

Code
r_feature_plot(epi, "ITGB8", split.by = "type")

Code
r_feature_plot(epi, "KRT17", split.by = "type")

Code
# Show markers of activated AEC2
r_feature_plot(epi, "AREG", split.by = "type")

Code
r_feature_plot(epi, "TP63", min.cutoff = 1, split.by = "type")

Code
r_feature_plot(epi, "KRT5", split.by = "type")

Code
# Assign cell types
epi2 <- RenameIdents(epi,
    `0` = "Basaloid",
    `1` = "AEC2",
    `2` = "AEC2",
    `3` = "AEC2",
    `4` = "AEC2",
    `5` = "Basaloid",
    `6` = "AEC2",
    `7` = "AEC1",
    `8` = "AEC2",
    `9` = "AEC2",
    `10` = "AEC1",
    `11` = "Basaloid",
    `12` = "Basaloid",
    `13` = "cAEC2")

epi2$epi_type <- Idents(epi2)

r_dim_plot(epi2,
    group.by = "epi_type",
    split.by = "type")

Code
# Consolidate into a summary dotplot
epi_genes <- c(
    "SFTPA1", "SFTPC", "PGC", # AEC2
    "AREG", "PIGR", # pAEC2
    "TOP2A", "MKI67", "CENPF", # cAEC2
    "KRT17", "TNC", "CDH2", "FN1", "TP63",  # Basaloid
    "AGER", "CAV1", "RTKN2") # AEC1

epi2$epi_type <- factor(epi2$epi_type, levels = c("AEC1", "Basaloid", "cAEC2", "pAEC2", "AEC2", 0:20))

DotPlot(epi2,
    features = epi_genes,
    group.by = "epi_type",
    dot.scale = 8,
    cols = "RdBu") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()
Warning: Scaling data with a low number of groups may produce misleading
results

Subcluster macrophages

Code
if(!file.exists("output/rdata/human_macs.qs")) {
    macs <- subset(lungs, subset = manual_assignments_v2 == "Myeloid")
    macs[["RNA"]]$scale.data <- NULL

    macs <- DietSeurat(macs, assays = "RNA") %>%
        FindVariableFeatures() %>%
        ScaleData() %>%
        RunPCA(npcs = 30, verbose = FALSE) %>%
        harmony::RunHarmony(
            group.by.vars = c("id", "method", "src")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony") %>%
        FindClusters(resolution = 0.4)

    p1 <- r_dim_plot(macs)
    p2 <- r_dim_plot(subset(macs, subset = type == "Healthy"),
        group.by = "cell_type",
        repel = TRUE,
        label.box = TRUE)
    p3 <- r_dim_plot(subset(macs, subset = type == "Metastasis"),
        group.by = "annotations",
        repel = TRUE,
        label.box = TRUE)
    p1 + p2 + p3

    # Identify clusters by DEGs
    mac_marks <- FindAllMarkers(macs)

    gt::gt(mac_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human macrophages (by cluster)")

    # Remove contaminating cells
    macs <- subset(macs,
        idents = c(
            "6", # T cells
            "8", # Lymphocytes
            "13"), # Mast cells
        invert = TRUE)

    # Redo the UMAP
    macs <- FindVariableFeatures(macs) %>%
        ScaleData() %>%
        RunPCA(npcs = 30, verbose = FALSE) %>%
        harmony::RunHarmony(
            group.by.vars = c("id", "method")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony", k.param = 10) %>%
        FindClusters(resolution = 0.8)

    macs <- run_fdl(macs)

    p1 <- r_dim_plot(subset(macs, subset = type == "Healthy"),
        # reduction = "fdl",
        group.by = "ann_level_4",
        repel = TRUE,
        label.box = TRUE)
    p2 <- r_dim_plot(subset(macs, subset = type == "Healthy"),
        # reduction = "fdl",
        group.by = "seurat_clusters",
        repel = TRUE,
        label.box = TRUE)
    p3 <- r_dim_plot(subset(macs, subset = type == "Metastasis"),
        # reduction = "fdl",
        group.by = "annotations",
        repel = TRUE,
        label.box = TRUE)
    p2 + p1 + p3

    r_feature_plot(macs, "MARCO", split.by = "type")
    r_feature_plot(macs, "CD68", split.by = "type")
    r_feature_plot(macs, "MRC1", split.by = "type")
    r_feature_plot(macs, "CCL3", split.by = "type")
    r_feature_plot(macs, "TREM2", split.by = "type")
    r_feature_plot(macs, "CD9", split.by = "type")
    r_feature_plot(macs, "SPP1", split.by = "type")
    r_feature_plot(macs, "GPNMB", split.by = "type")
    r_feature_plot(macs, "FABP5", split.by = "type")

    macs_marks <- FindAllMarkers(macs)

    gt::gt(macs_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human mononuclear cells (by cluster)")

    # Remove contaminating carryover
    macs <- subset(macs,
        idents = c("2", "8", "14", "16"),
        invert = TRUE)

    macs <- FindVariableFeatures(macs) %>%
        ScaleData() %>%
        RunPCA(npcs = 30, verbose = FALSE) %>%
        harmony::RunHarmony(
            group.by.vars = c("id", "method")) %>%
        RunUMAP(reduction = "harmony", dims = 1:30) %>%
        FindNeighbors(reduction = "harmony", k.param = 10) %>%
        FindClusters(resolution = 0.8)

    macs <- run_fdl(macs)

    r_dim_plot(macs, split.by = "type", repel = TRUE)
    r_dim_plot(macs, split.by = "type", reduction = "fdl", repel = TRUE, label.box = TRUE)

    macs <- subset(macs,
        idents = c("14"), # Doublets
        invert = TRUE)

    qs::qsave(macs, "output/rdata/human_macs.qs")
} else {
    macs <- qs::qread("output/rdata/human_macs.qs")
}

# Make final cell type annotations
if(!file.exists("misc/human_macs_assignments.qs")) {
    # Lay out the known annotations
    p1 <- r_dim_plot(subset(macs, subset = type == "Healthy"),
        # reduction = "fdl",
        group.by = "ann_level_4",
        repel = TRUE,
        label.box = TRUE)
    p2 <- r_dim_plot(subset(macs, subset = type == "Healthy"),
        # reduction = "fdl",
        group.by = "seurat_clusters",
        repel = TRUE,
        label.box = TRUE)
    p3 <- r_dim_plot(subset(macs, subset = type == "Metastasis"),
        # reduction = "fdl",
        group.by = "annotations",
        repel = TRUE,
        label.box = TRUE)
    p2 + p1 + p3

    macs_marks <- FindAllMarkers(macs)

    gt::gt(macs_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 20) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human mononuclear cells (by cluster)")

    # These papers were helpful for identifying cell types:
    # https://doi.org/10.1002%2Fctm2.1199
    # https://doi.org/10.4049/jimmunol.2100094
    # https://doi.org/10.1038/s41590-023-01468-3
    # https://doi.org/10.1126%2Fscience.aah4573
    macs <- RenameIdents(macs,
        "0" = "TAMs",
        "1" = "TAMs",
        "2" = "Alveolar",
        "3" = "Monocyte",
        "4" = "Alveolar",
        "5" = "Interstitial",
        "6" = "Scar-TAMs",
        "7" = "DC2",
        "8" = "Monocyte",
        "9" = "Inflammatory-TAMs",
        "10" = "Inflammatory-TAMs",
        "11" = "IFN-TAMs",
        "12" = "Osteoclast-TAMs",
        "13" = "Alveolar",
        "14" = "DC1")

    r_dim_plot(macs, split.by = "type")

    # Fix the monocytes - find the appropriate subclusters
    macs <- FindSubCluster(macs,
        "Monocyte",
        subcluster.name = "monos_subcluster",
        graph.name = "RNA_snn",
        resolution = 0.4)

    Idents(macs) <- macs$monos_subcluster

    r_dim_plot(macs, split.by = "type")

    monos_marks <- subset(macs2, macs_assignment == "Monocyte") %>%
        FindAllMarkers()

    gt::gt(monos_marks %>%
        group_by(cluster) %>%
        filter(pct.1 > 0.5) %>%
        slice_max(avg_log2FC, n = 30) %>%
        arrange(cluster, -avg_log2FC)) %>%
        gt::tab_header("DEGs in human mononuclear cells (by cluster)")

    macs <- RenameIdents(macs,
        "Monocyte_0" = "Inflammatory-TAMs",
        "Monocyte_1" = "ncMonocyte",
        "Monocyte_2" = "Inflammatory-TAMs",
        "Monocyte_3" = "cMonocyte",
        "Monocyte_4" = "Inflammatory-TAMs",
        "Monocyte_5" = "Inflammatory-TAMs")

    macs$macs_assignment <- Idents(macs)

    qs::qsave(macs$macs_assignment, "misc/human_macs_assignments.qs")
} else {
    macs$macs_assignment <- qs::qread("misc/human_macs_assignments.qs")
}

macs$macs_assignment <- factor(macs$macs_assignment, levels = c(
    "Osteoclast-TAMs",
    "Scar-TAMs",
    "TAMs",
    "IFN-TAMs",
    "Inflammatory-TAMs",
    "Interstitial",
    "Alveolar",
    "TIMs",
    "cMonocyte",
    "ncMonocyte",
    "DC1",
    "DC2"))

r_dim_plot(macs,
    split.by = "type",
    group.by = "macs_assignment",
    repel = TRUE)

Code
macs2 <- subset(macs, macs_assignment %in% c("Interstitial", "Alveolar"), invert = TRUE)

# Need to add https://doi.org/10.1126/sciimmunol.add8945 to scarMac citations
mac_features <- c(
    "MMP9", "CKB", "CTSK", # Osteoclast-TAMs
    "CD9", "SPP1", "TREM2", "GPNMB", # Scar-Macs
    "MERTK", "FOLR2", "SELENOP", # TAMs
    "CXCL10", "IFIT2", # IFN-TAMs
    "IL1B", "CXCL8", # Inflammatory
    # TIMs
    "S100A12", # cMonocyte
    "S100A8", "S100A9", # Monocyte
    "FCGR3A", # ncMonocyte
    "IRF8", "IDO1",  # DC1
    "FCER1A", "CD1C") # DC2

DotPlot(macs2,
    features = mac_features,
    group.by = "macs_assignment",
    cols = "RdBu",
    col.max = 1.5,
    col.min = -1) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Code
qs::qsave(macs, "output/rdata/human_macs_final.qs")

Read in and Process snRNA Data

Sample Info

Code
sample_info <- tribble(~sample_id,          ~treatment, ~species,
                          "S0283", "Late Nintedanib",  "mouse",
                          "S0284",  "Early Nintedanib",  "mouse",
                          "S0291",       "new control",  "mouse")
sample_info <- column_to_rownames(sample_info, "sample_id")

#save sample_info
write.table(sample_info, "input/sample_info.tsv")

These samples contain single nuclei transcriptomic data for three different samples from a particular study. Each sample was injected with F420 cells, and then was either not treated (S0291), treated with nintedanib one day after injection (S0284), or treated with nintedanib 28 days after injection (S0283).

Read in Data

I’m now going to read in the samples from the sample info table and create Seurat objects from them.

Code
path0 <- "/home/gdrobertslab/lab/Counts_2/"
path1 <- "/filtered_feature_bc_matrix/"

sample_list <- list()
for (id in rownames(sample_info)) {
    tmp_sobj <- tenx_load_qc(paste0(path0, id, path1),
                             violin_plot = FALSE,
                             species_pattern = "",
                             mt_pattern = "^mt-")

    tmp_sobj$treatment <- sample_info[id, ]$treatment
    tmp_sobj$sample_id <- id
    sample_list[[id]] <- tmp_sobj
}

Quality Control

I’m going to subset out dead/dying cells and likely multiplets based on the number of RNA fragments in each cell as well as the percentage of reads that are from mitochondrial genes.

Code
qc_plots <- lapply(sample_list,
                   feature_hist,
                   features = c("nCount_RNA", "percent.mt"))

maketabs(qc_plots)

Since the samples are nuclei-sequenced the percent of mitochondrial reads is miniscule, and it doesn’t make sense to subset with respect to this value so I will only set cutoffs based on nCount_RNA.

Code
cutoff_table <-
    tribble(~sample_id,     ~feature, ~min_val, ~max_val,
            "S0283",    "nCount_RNA",     1000,    10000,
            "S0284",    "nCount_RNA",     1000,     9000,
            "S0291",    "nCount_RNA",     1000,    25000) %>%
    column_to_rownames("sample_id")

#loop through each sample, subset, and create a feature hist with cutoffs
plts_w_cutoffs <- list()
for (id in names(sample_list)) {
    tmp_cuts <- cutoff_table[id, ]

    #add plot to list
    plts_w_cutoffs[[id]] <- feature_hist(sample_list[[id]],
                                         features = "nCount_RNA",
                                         cutoff_table = tmp_cuts)

    #subset object
    sample_list[[id]] <-
        subset(sample_list[[id]],
               nCount_RNA %in% c(tmp_cuts$min_val:tmp_cuts$max_val)) %>%
        process_seurat()

}
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
maketabs(plts_w_cutoffs)

Another way of checking the overall quality of the sample is by looking at how it clusters. If there is just one giant cluster then it likely means the sample is worse, and if there are separate clusters (which likely are individual cell types) then the sample is of good quality.

Code
umap_list <- lapply(sample_list, DimPlot)

maketabs(umap_list)

Each dataset looks to be good quality based on the clustering.

Merge snRNA Datasets

Consider Integrating

I’m now going to merge and then annotate the dataset.

Code
merged_samples <- merge(sample_list[[1]], sample_list[-1]) %>%
    JoinLayers() %>%
    process_seurat()
Warning: Some cell names are duplicated across objects provided. Renaming to
enforce unique cell names.
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
DimPlot(merged_samples, group.by = "sample_id")

The data looks fairly well integrated. All clusters contain cells from each sample.

Annotate Single-Nucleus Data

I’m going to use a snRNA-seq dataset as my reference for cell type annotations. The reference dataset is from “Single-Nucleus RNA-Sequencing Profiling of Mouse Lung. Reduced Dissociation Bias and Improved Rare Cell-Type Detection Compared with Single-Cell RNA Sequencing”. The GEO accession number is GSE145998, and the metadata was received upon request from the authors.

Code
wget -P input https://ftp.ncbi.nlm.nih.gov/geo/series/GSE145nnn/GSE145998/suppl/GSE145998%5FCtrl%5Fmouse%5FSnuc%5Fcomb.dge.txt.gz

gunzip input/GSE145998_Ctrl_mouse_Snuc_comb.dge.txt.gz
--2024-09-17 08:23:58--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE145nnn/GSE145998/suppl/GSE145998%5FCtrl%5Fmouse%5FSnuc%5Fcomb.dge.txt.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 130.14.250.31, 130.14.250.7, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34350155 (33M) [application/x-gzip]
Saving to: ‘input/GSE145998_Ctrl_mouse_Snuc_comb.dge.txt.gz’

     0K .......... .......... .......... .......... ..........  0% 1.70M 19s
    50K .......... .......... .......... .......... ..........  0% 3.11M 15s
   100K .......... .......... .......... .......... ..........  0% 22.2M 10s
   150K .......... .......... .......... .......... ..........  0% 40.1M 8s
   200K .......... .......... .......... .......... ..........  0% 3.32M 8s
   250K .......... .......... .......... .......... ..........  0% 35.9M 7s
   300K .......... .......... .......... .......... ..........  1% 45.3M 6s
   350K .......... .......... .......... .......... ..........  1% 41.0M 5s
   400K .......... .......... .......... .......... ..........  1% 3.84M 6s
   450K .......... .......... .......... .......... ..........  1% 37.8M 5s
   500K .......... .......... .......... .......... ..........  1% 41.6M 5s
   550K .......... .......... .......... .......... ..........  1% 43.8M 5s
   600K .......... .......... .......... .......... ..........  1% 35.0M 4s
   650K .......... .......... .......... .......... ..........  2% 40.0M 4s
   700K .......... .......... .......... .......... ..........  2% 42.3M 4s
   750K .......... .......... .......... .......... ..........  2% 44.1M 4s
   800K .......... .......... .......... .......... ..........  2% 5.75M 4s
   850K .......... .......... .......... .......... ..........  2% 38.1M 4s
   900K .......... .......... .......... .......... ..........  2% 37.6M 3s
   950K .......... .......... .......... .......... ..........  2% 34.4M 3s
  1000K .......... .......... .......... .......... ..........  3% 39.1M 3s
  1050K .......... .......... .......... .......... ..........  3% 47.5M 3s
  1100K .......... .......... .......... .......... ..........  3% 48.7M 3s
  1150K .......... .......... .......... .......... ..........  3% 50.5M 3s
  1200K .......... .......... .......... .......... ..........  3% 37.9M 3s
  1250K .......... .......... .......... .......... ..........  3% 47.8M 3s
  1300K .......... .......... .......... .......... ..........  4% 51.0M 3s
  1350K .......... .......... .......... .......... ..........  4% 48.1M 2s
  1400K .......... .......... .......... .......... ..........  4% 45.5M 2s
  1450K .......... .......... .......... .......... ..........  4% 44.4M 2s
  1500K .......... .......... .......... .......... ..........  4% 47.1M 2s
  1550K .......... .......... .......... .......... ..........  4% 56.5M 2s
  1600K .......... .......... .......... .......... ..........  4% 41.4M 2s
  1650K .......... .......... .......... .......... ..........  5% 50.5M 2s
  1700K .......... .......... .......... .......... ..........  5% 44.6M 2s
  1750K .......... .......... .......... .......... ..........  5% 50.7M 2s
  1800K .......... .......... .......... .......... ..........  5% 45.5M 2s
  1850K .......... .......... .......... .......... ..........  5% 47.6M 2s
  1900K .......... .......... .......... .......... ..........  5% 47.3M 2s
  1950K .......... .......... .......... .......... ..........  5% 42.3M 2s
  2000K .......... .......... .......... .......... ..........  6% 44.5M 2s
  2050K .......... .......... .......... .......... ..........  6% 51.6M 2s
  2100K .......... .......... .......... .......... ..........  6% 50.4M 2s
  2150K .......... .......... .......... .......... ..........  6% 1.86M 2s
  2200K .......... .......... .......... .......... ..........  6% 31.4M 2s
  2250K .......... .......... .......... .......... ..........  6% 38.1M 2s
  2300K .......... .......... .......... .......... ..........  7% 40.6M 2s
  2350K .......... .......... .......... .......... ..........  7% 45.6M 2s
  2400K .......... .......... .......... .......... ..........  7% 39.6M 2s
  2450K .......... .......... .......... .......... ..........  7% 40.0M 2s
  2500K .......... .......... .......... .......... ..........  7% 48.0M 2s
  2550K .......... .......... .......... .......... ..........  7% 45.3M 2s
  2600K .......... .......... .......... .......... ..........  7% 41.4M 2s
  2650K .......... .......... .......... .......... ..........  8% 43.4M 2s
  2700K .......... .......... .......... .......... ..........  8% 43.6M 2s
  2750K .......... .......... .......... .......... ..........  8% 42.9M 2s
  2800K .......... .......... .......... .......... ..........  8% 39.8M 2s
  2850K .......... .......... .......... .......... ..........  8% 46.3M 2s
  2900K .......... .......... .......... .......... ..........  8% 44.4M 2s
  2950K .......... .......... .......... .......... ..........  8% 42.4M 2s
  3000K .......... .......... .......... .......... ..........  9% 36.8M 2s
  3050K .......... .......... .......... .......... ..........  9% 45.8M 2s
  3100K .......... .......... .......... .......... ..........  9% 48.0M 2s
  3150K .......... .......... .......... .......... ..........  9% 46.0M 2s
  3200K .......... .......... .......... .......... ..........  9% 39.7M 2s
  3250K .......... .......... .......... .......... ..........  9% 44.9M 2s
  3300K .......... .......... .......... .......... ..........  9% 46.4M 2s
  3350K .......... .......... .......... .......... .......... 10% 49.1M 2s
  3400K .......... .......... .......... .......... .......... 10% 47.5M 2s
  3450K .......... .......... .......... .......... .......... 10% 48.5M 2s
  3500K .......... .......... .......... .......... .......... 10% 49.4M 2s
  3550K .......... .......... .......... .......... .......... 10% 47.2M 2s
  3600K .......... .......... .......... .......... .......... 10% 39.2M 1s
  3650K .......... .......... .......... .......... .......... 11% 50.7M 1s
  3700K .......... .......... .......... .......... .......... 11% 51.1M 1s
  3750K .......... .......... .......... .......... .......... 11% 42.7M 1s
  3800K .......... .......... .......... .......... .......... 11%  449K 2s
  3850K .......... .......... .......... .......... .......... 11% 18.8M 2s
  3900K .......... .......... .......... .......... .......... 11% 30.9M 2s
  3950K .......... .......... .......... .......... .......... 11% 31.6M 2s
  4000K .......... .......... .......... .......... .......... 12% 28.5M 2s
  4050K .......... .......... .......... .......... .......... 12% 40.5M 2s
  4100K .......... .......... .......... .......... .......... 12% 39.1M 2s
  4150K .......... .......... .......... .......... .......... 12% 45.6M 2s
  4200K .......... .......... .......... .......... .......... 12% 41.0M 2s
  4250K .......... .......... .......... .......... .......... 12% 44.7M 2s
  4300K .......... .......... .......... .......... .......... 12% 53.7M 2s
  4350K .......... .......... .......... .......... .......... 13% 47.7M 2s
  4400K .......... .......... .......... .......... .......... 13% 41.6M 2s
  4450K .......... .......... .......... .......... .......... 13% 49.1M 2s
  4500K .......... .......... .......... .......... .......... 13% 44.8M 2s
  4550K .......... .......... .......... .......... .......... 13% 48.2M 2s
  4600K .......... .......... .......... .......... .......... 13% 41.1M 2s
  4650K .......... .......... .......... .......... .......... 14% 47.0M 2s
  4700K .......... .......... .......... .......... .......... 14% 48.3M 2s
  4750K .......... .......... .......... .......... .......... 14% 45.6M 2s
  4800K .......... .......... .......... .......... .......... 14% 41.4M 2s
  4850K .......... .......... .......... .......... .......... 14% 48.5M 2s
  4900K .......... .......... .......... .......... .......... 14% 49.6M 2s
  4950K .......... .......... .......... .......... .......... 14% 47.2M 2s
  5000K .......... .......... .......... .......... .......... 15% 43.9M 2s
  5050K .......... .......... .......... .......... .......... 15% 53.2M 2s
  5100K .......... .......... .......... .......... .......... 15% 47.6M 2s
  5150K .......... .......... .......... .......... .......... 15% 48.8M 2s
  5200K .......... .......... .......... .......... .......... 15% 45.0M 2s
  5250K .......... .......... .......... .......... .......... 15% 46.4M 2s
  5300K .......... .......... .......... .......... .......... 15% 55.1M 2s
  5350K .......... .......... .......... .......... .......... 16% 48.4M 2s
  5400K .......... .......... .......... .......... .......... 16% 44.0M 2s
  5450K .......... .......... .......... .......... .......... 16% 55.1M 2s
  5500K .......... .......... .......... .......... .......... 16% 50.1M 2s
  5550K .......... .......... .......... .......... .......... 16% 56.2M 2s
  5600K .......... .......... .......... .......... .......... 16% 44.4M 2s
  5650K .......... .......... .......... .......... .......... 16% 44.8M 2s
  5700K .......... .......... .......... .......... .......... 17% 53.9M 2s
  5750K .......... .......... .......... .......... .......... 17% 50.8M 2s
  5800K .......... .......... .......... .......... .......... 17% 36.8M 2s
  5850K .......... .......... .......... .......... .......... 17% 49.0M 2s
  5900K .......... .......... .......... .......... .......... 17% 55.8M 2s
  5950K .......... .......... .......... .......... .......... 17% 48.3M 2s
  6000K .......... .......... .......... .......... .......... 18% 39.5M 2s
  6050K .......... .......... .......... .......... .......... 18% 46.5M 2s
  6100K .......... .......... .......... .......... .......... 18% 48.8M 2s
  6150K .......... .......... .......... .......... .......... 18% 49.9M 2s
  6200K .......... .......... .......... .......... .......... 18% 41.4M 2s
  6250K .......... .......... .......... .......... .......... 18% 49.6M 2s
  6300K .......... .......... .......... .......... .......... 18% 43.2M 2s
  6350K .......... .......... .......... .......... .......... 19% 47.0M 2s
  6400K .......... .......... .......... .......... .......... 19% 42.6M 1s
  6450K .......... .......... .......... .......... .......... 19% 48.0M 1s
  6500K .......... .......... .......... .......... .......... 19% 48.2M 1s
  6550K .......... .......... .......... .......... .......... 19% 44.8M 1s
  6600K .......... .......... .......... .......... .......... 19% 43.1M 1s
  6650K .......... .......... .......... .......... .......... 19% 47.0M 1s
  6700K .......... .......... .......... .......... .......... 20% 51.9M 1s
  6750K .......... .......... .......... .......... .......... 20% 46.9M 1s
  6800K .......... .......... .......... .......... .......... 20% 38.0M 1s
  6850K .......... .......... .......... .......... .......... 20% 47.5M 1s
  6900K .......... .......... .......... .......... .......... 20% 47.0M 1s
  6950K .......... .......... .......... .......... .......... 20% 47.4M 1s
  7000K .......... .......... .......... .......... .......... 21% 41.9M 1s
  7050K .......... .......... .......... .......... .......... 21% 43.2M 1s
  7100K .......... .......... .......... .......... .......... 21% 48.8M 1s
  7150K .......... .......... .......... .......... .......... 21% 47.5M 1s
  7200K .......... .......... .......... .......... .......... 21% 40.2M 1s
  7250K .......... .......... .......... .......... .......... 21% 50.3M 1s
  7300K .......... .......... .......... .......... .......... 21% 51.1M 1s
  7350K .......... .......... .......... .......... .......... 22% 47.1M 1s
  7400K .......... .......... .......... .......... .......... 22%  184K 2s
  7450K .......... .......... .......... .......... .......... 22% 24.6M 2s
  7500K .......... .......... .......... .......... .......... 22% 32.1M 2s
  7550K .......... .......... .......... .......... .......... 22% 35.8M 2s
  7600K .......... .......... .......... .......... .......... 22% 36.6M 2s
  7650K .......... .......... .......... .......... .......... 22% 47.5M 2s
  7700K .......... .......... .......... .......... .......... 23% 47.1M 2s
  7750K .......... .......... .......... .......... .......... 23% 42.0M 2s
  7800K .......... .......... .......... .......... .......... 23% 39.3M 2s
  7850K .......... .......... .......... .......... .......... 23% 49.6M 2s
  7900K .......... .......... .......... .......... .......... 23% 45.7M 2s
  7950K .......... .......... .......... .......... .......... 23% 45.6M 2s
  8000K .......... .......... .......... .......... .......... 23% 35.3M 2s
  8050K .......... .......... .......... .......... .......... 24% 48.7M 2s
  8100K .......... .......... .......... .......... .......... 24% 41.6M 2s
  8150K .......... .......... .......... .......... .......... 24% 41.6M 2s
  8200K .......... .......... .......... .......... .......... 24% 34.8M 2s
  8250K .......... .......... .......... .......... .......... 24% 49.2M 2s
  8300K .......... .......... .......... .......... .......... 24% 42.6M 2s
  8350K .......... .......... .......... .......... .......... 25% 44.1M 2s
  8400K .......... .......... .......... .......... .......... 25% 39.9M 2s
  8450K .......... .......... .......... .......... .......... 25% 44.0M 2s
  8500K .......... .......... .......... .......... .......... 25% 42.7M 2s
  8550K .......... .......... .......... .......... .......... 25% 45.5M 2s
  8600K .......... .......... .......... .......... .......... 25% 38.7M 2s
  8650K .......... .......... .......... .......... .......... 25% 47.1M 2s
  8700K .......... .......... .......... .......... .......... 26% 41.9M 2s
  8750K .......... .......... .......... .......... .......... 26% 43.5M 2s
  8800K .......... .......... .......... .......... .......... 26% 37.0M 2s
  8850K .......... .......... .......... .......... .......... 26% 47.7M 2s
  8900K .......... .......... .......... .......... .......... 26% 43.2M 2s
  8950K .......... .......... .......... .......... .......... 26% 42.9M 2s
  9000K .......... .......... .......... .......... .......... 26% 39.2M 2s
  9050K .......... .......... .......... .......... .......... 27% 46.4M 2s
  9100K .......... .......... .......... .......... .......... 27% 50.6M 2s
  9150K .......... .......... .......... .......... .......... 27% 46.5M 2s
  9200K .......... .......... .......... .......... .......... 27% 37.5M 2s
  9250K .......... .......... .......... .......... .......... 27% 47.1M 2s
  9300K .......... .......... .......... .......... .......... 27% 43.9M 2s
  9350K .......... .......... .......... .......... .......... 28% 45.7M 2s
  9400K .......... .......... .......... .......... .......... 28% 39.5M 2s
  9450K .......... .......... .......... .......... .......... 28% 44.3M 2s
  9500K .......... .......... .......... .......... .......... 28% 44.2M 2s
  9550K .......... .......... .......... .......... .......... 28% 43.0M 2s
  9600K .......... .......... .......... .......... .......... 28% 42.1M 2s
  9650K .......... .......... .......... .......... .......... 28% 47.1M 2s
  9700K .......... .......... .......... .......... .......... 29% 43.8M 2s
  9750K .......... .......... .......... .......... .......... 29% 41.5M 2s
  9800K .......... .......... .......... .......... .......... 29% 41.3M 2s
  9850K .......... .......... .......... .......... .......... 29% 43.5M 2s
  9900K .......... .......... .......... .......... .......... 29% 44.2M 2s
  9950K .......... .......... .......... .......... .......... 29% 45.3M 2s
 10000K .......... .......... .......... .......... .......... 29% 36.9M 2s
 10050K .......... .......... .......... .......... .......... 30% 46.4M 2s
 10100K .......... .......... .......... .......... .......... 30% 42.9M 2s
 10150K .......... .......... .......... .......... .......... 30% 46.0M 2s
 10200K .......... .......... .......... .......... .......... 30% 40.4M 2s
 10250K .......... .......... .......... .......... .......... 30% 42.6M 2s
 10300K .......... .......... .......... .......... .......... 30% 47.7M 2s
 10350K .......... .......... .......... .......... .......... 31% 47.8M 2s
 10400K .......... .......... .......... .......... .......... 31% 40.0M 2s
 10450K .......... .......... .......... .......... .......... 31% 50.9M 2s
 10500K .......... .......... .......... .......... .......... 31% 42.4M 2s
 10550K .......... .......... .......... .......... .......... 31% 48.6M 2s
 10600K .......... .......... .......... .......... .......... 31% 41.7M 2s
 10650K .......... .......... .......... .......... .......... 31% 48.6M 2s
 10700K .......... .......... .......... .......... .......... 32% 48.9M 2s
 10750K .......... .......... .......... .......... .......... 32% 43.6M 2s
 10800K .......... .......... .......... .......... .......... 32% 40.7M 2s
 10850K .......... .......... .......... .......... .......... 32% 47.4M 2s
 10900K .......... .......... .......... .......... .......... 32% 47.5M 2s
 10950K .......... .......... .......... .......... .......... 32% 47.9M 1s
 11000K .......... .......... .......... .......... .......... 32% 37.5M 1s
 11050K .......... .......... .......... .......... .......... 33% 45.3M 1s
 11100K .......... .......... .......... .......... .......... 33% 46.5M 1s
 11150K .......... .......... .......... .......... .......... 33% 48.0M 1s
 11200K .......... .......... .......... .......... .......... 33% 38.7M 1s
 11250K .......... .......... .......... .......... .......... 33%  339K 2s
 11300K .......... .......... .......... .......... .......... 33% 25.8M 2s
 11350K .......... .......... .......... .......... .......... 33% 10.4M 2s
 11400K .......... .......... .......... .......... .......... 34% 22.8M 2s
 11450K .......... .......... .......... .......... .......... 34% 32.7M 2s
 11500K .......... .......... .......... .......... .......... 34% 32.0M 2s
 11550K .......... .......... .......... .......... .......... 34% 36.2M 2s
 11600K .......... .......... .......... .......... .......... 34% 36.9M 2s
 11650K .......... .......... .......... .......... .......... 34% 44.7M 2s
 11700K .......... .......... .......... .......... .......... 35% 42.5M 2s
 11750K .......... .......... .......... .......... .......... 35% 45.1M 2s
 11800K .......... .......... .......... .......... .......... 35% 36.0M 2s
 11850K .......... .......... .......... .......... .......... 35% 42.9M 2s
 11900K .......... .......... .......... .......... .......... 35% 42.1M 2s
 11950K .......... .......... .......... .......... .......... 35% 40.8M 2s
 12000K .......... .......... .......... .......... .......... 35% 36.5M 2s
 12050K .......... .......... .......... .......... .......... 36% 45.6M 2s
 12100K .......... .......... .......... .......... .......... 36% 43.3M 2s
 12150K .......... .......... .......... .......... .......... 36% 42.0M 2s
 12200K .......... .......... .......... .......... .......... 36% 39.5M 2s
 12250K .......... .......... .......... .......... .......... 36% 41.7M 2s
 12300K .......... .......... .......... .......... .......... 36% 38.2M 2s
 12350K .......... .......... .......... .......... .......... 36% 43.2M 2s
 12400K .......... .......... .......... .......... .......... 37% 37.7M 2s
 12450K .......... .......... .......... .......... .......... 37% 41.8M 2s
 12500K .......... .......... .......... .......... .......... 37% 44.3M 2s
 12550K .......... .......... .......... .......... .......... 37% 40.5M 2s
 12600K .......... .......... .......... .......... .......... 37% 37.6M 2s
 12650K .......... .......... .......... .......... .......... 37% 47.8M 2s
 12700K .......... .......... .......... .......... .......... 38% 47.9M 2s
 12750K .......... .......... .......... .......... .......... 38% 46.1M 1s
 12800K .......... .......... .......... .......... .......... 38% 36.0M 1s
 12850K .......... .......... .......... .......... .......... 38% 45.3M 1s
 12900K .......... .......... .......... .......... .......... 38% 55.8M 1s
 12950K .......... .......... .......... .......... .......... 38% 51.2M 1s
 13000K .......... .......... .......... .......... .......... 38% 38.5M 1s
 13050K .......... .......... .......... .......... .......... 39% 49.3M 1s
 13100K .......... .......... .......... .......... .......... 39% 47.2M 1s
 13150K .......... .......... .......... .......... .......... 39% 45.6M 1s
 13200K .......... .......... .......... .......... .......... 39% 36.6M 1s
 13250K .......... .......... .......... .......... .......... 39% 48.2M 1s
 13300K .......... .......... .......... .......... .......... 39% 45.0M 1s
 13350K .......... .......... .......... .......... .......... 39% 51.2M 1s
 13400K .......... .......... .......... .......... .......... 40% 41.2M 1s
 13450K .......... .......... .......... .......... .......... 40% 51.9M 1s
 13500K .......... .......... .......... .......... .......... 40% 48.4M 1s
 13550K .......... .......... .......... .......... .......... 40% 46.0M 1s
 13600K .......... .......... .......... .......... .......... 40% 44.9M 1s
 13650K .......... .......... .......... .......... .......... 40% 50.5M 1s
 13700K .......... .......... .......... .......... .......... 40% 51.9M 1s
 13750K .......... .......... .......... .......... .......... 41% 46.8M 1s
 13800K .......... .......... .......... .......... .......... 41% 38.8M 1s
 13850K .......... .......... .......... .......... .......... 41% 50.9M 1s
 13900K .......... .......... .......... .......... .......... 41% 48.7M 1s
 13950K .......... .......... .......... .......... .......... 41% 48.6M 1s
 14000K .......... .......... .......... .......... .......... 41% 42.2M 1s
 14050K .......... .......... .......... .......... .......... 42% 45.6M 1s
 14100K .......... .......... .......... .......... .......... 42% 48.9M 1s
 14150K .......... .......... .......... .......... .......... 42% 43.3M 1s
 14200K .......... .......... .......... .......... .......... 42% 38.4M 1s
 14250K .......... .......... .......... .......... .......... 42% 43.0M 1s
 14300K .......... .......... .......... .......... .......... 42% 45.4M 1s
 14350K .......... .......... .......... .......... .......... 42% 41.2M 1s
 14400K .......... .......... .......... .......... .......... 43% 40.0M 1s
 14450K .......... .......... .......... .......... .......... 43% 48.0M 1s
 14500K .......... .......... .......... .......... .......... 43% 44.3M 1s
 14550K .......... .......... .......... .......... .......... 43% 46.2M 1s
 14600K .......... .......... .......... .......... .......... 43% 37.1M 1s
 14650K .......... .......... .......... .......... .......... 43% 49.2M 1s
 14700K .......... .......... .......... .......... .......... 43% 49.4M 1s
 14750K .......... .......... .......... .......... .......... 44% 47.0M 1s
 14800K .......... .......... .......... .......... .......... 44% 39.0M 1s
 14850K .......... .......... .......... .......... .......... 44% 44.9M 1s
 14900K .......... .......... .......... .......... .......... 44% 44.0M 1s
 14950K .......... .......... .......... .......... .......... 44% 48.1M 1s
 15000K .......... .......... .......... .......... .......... 44% 39.6M 1s
 15050K .......... .......... .......... .......... .......... 45% 44.7M 1s
 15100K .......... .......... .......... .......... .......... 45% 41.7M 1s
 15150K .......... .......... .......... .......... .......... 45% 42.4M 1s
 15200K .......... .......... .......... .......... .......... 45% 34.7M 1s
 15250K .......... .......... .......... .......... .......... 45% 46.8M 1s
 15300K .......... .......... .......... .......... .......... 45% 52.3M 1s
 15350K .......... .......... .......... .......... .......... 45% 44.3M 1s
 15400K .......... .......... .......... .......... .......... 46% 40.8M 1s
 15450K .......... .......... .......... .......... .......... 46% 52.2M 1s
 15500K .......... .......... .......... .......... .......... 46% 50.7M 1s
 15550K .......... .......... .......... .......... .......... 46% 51.2M 1s
 15600K .......... .......... .......... .......... .......... 46% 38.9M 1s
 15650K .......... .......... .......... .......... .......... 46% 49.0M 1s
 15700K .......... .......... .......... .......... .......... 46% 47.9M 1s
 15750K .......... .......... .......... .......... .......... 47% 48.0M 1s
 15800K .......... .......... .......... .......... .......... 47% 41.5M 1s
 15850K .......... .......... .......... .......... .......... 47% 42.3M 1s
 15900K .......... .......... .......... .......... .......... 47% 47.1M 1s
 15950K .......... .......... .......... .......... .......... 47% 47.6M 1s
 16000K .......... .......... .......... .......... .......... 47% 42.5M 1s
 16050K .......... .......... .......... .......... .......... 47% 48.1M 1s
 16100K .......... .......... .......... .......... .......... 48% 44.5M 1s
 16150K .......... .......... .......... .......... .......... 48% 50.3M 1s
 16200K .......... .......... .......... .......... .......... 48% 40.3M 1s
 16250K .......... .......... .......... .......... .......... 48% 55.5M 1s
 16300K .......... .......... .......... .......... .......... 48% 49.9M 1s
 16350K .......... .......... .......... .......... .......... 48% 34.6M 1s
 16400K .......... .......... .......... .......... .......... 49% 36.6M 1s
 16450K .......... .......... .......... .......... .......... 49% 51.7M 1s
 16500K .......... .......... .......... .......... .......... 49% 48.7M 1s
 16550K .......... .......... .......... .......... .......... 49% 49.8M 1s
 16600K .......... .......... .......... .......... .......... 49% 37.9M 1s
 16650K .......... .......... .......... .......... .......... 49% 48.4M 1s
 16700K .......... .......... .......... .......... .......... 49% 48.5M 1s
 16750K .......... .......... .......... .......... .......... 50% 48.5M 1s
 16800K .......... .......... .......... .......... .......... 50% 40.4M 1s
 16850K .......... .......... .......... .......... .......... 50% 43.4M 1s
 16900K .......... .......... .......... .......... .......... 50% 46.8M 1s
 16950K .......... .......... .......... .......... .......... 50% 49.7M 1s
 17000K .......... .......... .......... .......... .......... 50% 42.5M 1s
 17050K .......... .......... .......... .......... .......... 50% 48.6M 1s
 17100K .......... .......... .......... .......... .......... 51% 44.3M 1s
 17150K .......... .......... .......... .......... .......... 51% 45.5M 1s
 17200K .......... .......... .......... .......... .......... 51% 39.4M 1s
 17250K .......... .......... .......... .......... .......... 51% 48.0M 1s
 17300K .......... .......... .......... .......... .......... 51% 49.6M 1s
 17350K .......... .......... .......... .......... .......... 51% 47.1M 1s
 17400K .......... .......... .......... .......... .......... 52% 44.1M 1s
 17450K .......... .......... .......... .......... .......... 52% 46.8M 1s
 17500K .......... .......... .......... .......... .......... 52% 47.3M 1s
 17550K .......... .......... .......... .......... .......... 52% 43.6M 1s
 17600K .......... .......... .......... .......... .......... 52% 30.0M 1s
 17650K .......... .......... .......... .......... .......... 52% 47.2M 1s
 17700K .......... .......... .......... .......... .......... 52% 35.7M 1s
 17750K .......... .......... .......... .......... .......... 53% 41.9M 1s
 17800K .......... .......... .......... .......... .......... 53% 36.3M 1s
 17850K .......... .......... .......... .......... .......... 53% 46.3M 1s
 17900K .......... .......... .......... .......... .......... 53% 42.0M 1s
 17950K .......... .......... .......... .......... .......... 53% 44.7M 1s
 18000K .......... .......... .......... .......... .......... 53% 36.6M 1s
 18050K .......... .......... .......... .......... .......... 53% 40.5M 1s
 18100K .......... .......... .......... .......... .......... 54% 44.0M 1s
 18150K .......... .......... .......... .......... .......... 54% 43.0M 1s
 18200K .......... .......... .......... .......... .......... 54% 39.8M 1s
 18250K .......... .......... .......... .......... .......... 54% 42.4M 1s
 18300K .......... .......... .......... .......... .......... 54% 41.4M 1s
 18350K .......... .......... .......... .......... .......... 54% 48.0M 1s
 18400K .......... .......... .......... .......... .......... 55% 41.7M 1s
 18450K .......... .......... .......... .......... .......... 55% 47.1M 1s
 18500K .......... .......... .......... .......... .......... 55% 49.6M 1s
 18550K .......... .......... .......... .......... .......... 55% 45.2M 1s
 18600K .......... .......... .......... .......... .......... 55% 41.1M 1s
 18650K .......... .......... .......... .......... .......... 55% 42.1M 1s
 18700K .......... .......... .......... .......... .......... 55% 47.7M 1s
 18750K .......... .......... .......... .......... .......... 56% 51.1M 1s
 18800K .......... .......... .......... .......... .......... 56% 37.7M 1s
 18850K .......... .......... .......... .......... .......... 56% 46.0M 1s
 18900K .......... .......... .......... .......... .......... 56% 46.3M 1s
 18950K .......... .......... .......... .......... .......... 56% 49.4M 1s
 19000K .......... .......... .......... .......... .......... 56% 44.3M 1s
 19050K .......... .......... .......... .......... .......... 56% 48.8M 1s
 19100K .......... .......... .......... .......... .......... 57% 48.8M 1s
 19150K .......... .......... .......... .......... .......... 57% 42.8M 1s
 19200K .......... .......... .......... .......... .......... 57% 40.3M 1s
 19250K .......... .......... .......... .......... .......... 57% 46.7M 1s
 19300K .......... .......... .......... .......... .......... 57% 45.0M 1s
 19350K .......... .......... .......... .......... .......... 57% 48.8M 1s
 19400K .......... .......... .......... .......... .......... 57% 38.5M 1s
 19450K .......... .......... .......... .......... .......... 58% 51.8M 1s
 19500K .......... .......... .......... .......... .......... 58% 47.2M 1s
 19550K .......... .......... .......... .......... .......... 58% 45.2M 1s
 19600K .......... .......... .......... .......... .......... 58% 39.3M 1s
 19650K .......... .......... .......... .......... .......... 58% 42.5M 1s
 19700K .......... .......... .......... .......... .......... 58% 45.4M 1s
 19750K .......... .......... .......... .......... .......... 59% 45.5M 1s
 19800K .......... .......... .......... .......... .......... 59% 38.6M 1s
 19850K .......... .......... .......... .......... .......... 59% 44.4M 1s
 19900K .......... .......... .......... .......... .......... 59% 43.5M 1s
 19950K .......... .......... .......... .......... .......... 59% 43.9M 1s
 20000K .......... .......... .......... .......... .......... 59% 39.3M 1s
 20050K .......... .......... .......... .......... .......... 59% 46.5M 1s
 20100K .......... .......... .......... .......... .......... 60% 44.1M 1s
 20150K .......... .......... .......... .......... .......... 60% 43.8M 1s
 20200K .......... .......... .......... .......... .......... 60% 37.2M 1s
 20250K .......... .......... .......... .......... .......... 60% 45.3M 1s
 20300K .......... .......... .......... .......... .......... 60% 43.7M 1s
 20350K .......... .......... .......... .......... .......... 60% 43.6M 1s
 20400K .......... .......... .......... .......... .......... 60% 40.5M 1s
 20450K .......... .......... .......... .......... .......... 61% 43.0M 1s
 20500K .......... .......... .......... .......... .......... 61% 45.2M 1s
 20550K .......... .......... .......... .......... .......... 61% 43.7M 1s
 20600K .......... .......... .......... .......... .......... 61% 39.5M 1s
 20650K .......... .......... .......... .......... .......... 61% 45.7M 1s
 20700K .......... .......... .......... .......... .......... 61% 49.0M 1s
 20750K .......... .......... .......... .......... .......... 62% 46.8M 1s
 20800K .......... .......... .......... .......... .......... 62% 38.1M 1s
 20850K .......... .......... .......... .......... .......... 62% 46.4M 1s
 20900K .......... .......... .......... .......... .......... 62% 48.7M 1s
 20950K .......... .......... .......... .......... .......... 62% 41.1M 1s
 21000K .......... .......... .......... .......... .......... 62% 38.9M 1s
 21050K .......... .......... .......... .......... .......... 62% 45.4M 1s
 21100K .......... .......... .......... .......... .......... 63% 43.1M 1s
 21150K .......... .......... .......... .......... .......... 63% 47.2M 1s
 21200K .......... .......... .......... .......... .......... 63% 35.1M 1s
 21250K .......... .......... .......... .......... .......... 63% 45.8M 1s
 21300K .......... .......... .......... .......... .......... 63% 45.6M 1s
 21350K .......... .......... .......... .......... .......... 63% 44.2M 1s
 21400K .......... .......... .......... .......... .......... 63% 39.0M 1s
 21450K .......... .......... .......... .......... .......... 64% 42.3M 1s
 21500K .......... .......... .......... .......... .......... 64% 48.0M 1s
 21550K .......... .......... .......... .......... .......... 64% 46.1M 1s
 21600K .......... .......... .......... .......... .......... 64% 41.3M 1s
 21650K .......... .......... .......... .......... .......... 64% 43.1M 1s
 21700K .......... .......... .......... .......... .......... 64% 42.0M 1s
 21750K .......... .......... .......... .......... .......... 64% 49.9M 1s
 21800K .......... .......... .......... .......... .......... 65% 39.0M 1s
 21850K .......... .......... .......... .......... .......... 65% 51.1M 1s
 21900K .......... .......... .......... .......... .......... 65% 46.0M 1s
 21950K .......... .......... .......... .......... .......... 65% 46.8M 1s
 22000K .......... .......... .......... .......... .......... 65% 40.9M 1s
 22050K .......... .......... .......... .......... .......... 65% 47.8M 1s
 22100K .......... .......... .......... .......... .......... 66% 45.2M 1s
 22150K .......... .......... .......... .......... .......... 66% 48.2M 1s
 22200K .......... .......... .......... .......... .......... 66% 40.6M 1s
 22250K .......... .......... .......... .......... .......... 66% 53.0M 1s
 22300K .......... .......... .......... .......... .......... 66% 48.2M 1s
 22350K .......... .......... .......... .......... .......... 66% 49.0M 1s
 22400K .......... .......... .......... .......... .......... 66% 39.5M 1s
 22450K .......... .......... .......... .......... .......... 67% 45.4M 1s
 22500K .......... .......... .......... .......... .......... 67% 45.2M 1s
 22550K .......... .......... .......... .......... .......... 67% 45.1M 1s
 22600K .......... .......... .......... .......... .......... 67% 40.5M 1s
 22650K .......... .......... .......... .......... .......... 67% 47.1M 1s
 22700K .......... .......... .......... .......... .......... 67% 47.8M 1s
 22750K .......... .......... .......... .......... .......... 67% 47.3M 1s
 22800K .......... .......... .......... .......... .......... 68% 40.1M 1s
 22850K .......... .......... .......... .......... .......... 68% 44.5M 1s
 22900K .......... .......... .......... .......... .......... 68% 48.4M 1s
 22950K .......... .......... .......... .......... .......... 68% 50.2M 1s
 23000K .......... .......... .......... .......... .......... 68% 36.0M 1s
 23050K .......... .......... .......... .......... .......... 68% 48.4M 1s
 23100K .......... .......... .......... .......... .......... 69% 45.2M 1s
 23150K .......... .......... .......... .......... .......... 69% 44.7M 1s
 23200K .......... .......... .......... .......... .......... 69% 41.9M 1s
 23250K .......... .......... .......... .......... .......... 69% 41.8M 1s
 23300K .......... .......... .......... .......... .......... 69% 46.6M 1s
 23350K .......... .......... .......... .......... .......... 69% 45.0M 1s
 23400K .......... .......... .......... .......... .......... 69% 37.5M 0s
 23450K .......... .......... .......... .......... .......... 70% 49.6M 0s
 23500K .......... .......... .......... .......... .......... 70% 42.7M 0s
 23550K .......... .......... .......... .......... .......... 70% 45.9M 0s
 23600K .......... .......... .......... .......... .......... 70% 38.1M 0s
 23650K .......... .......... .......... .......... .......... 70% 45.5M 0s
 23700K .......... .......... .......... .......... .......... 70% 49.6M 0s
 23750K .......... .......... .......... .......... .......... 70% 42.8M 0s
 23800K .......... .......... .......... .......... .......... 71% 39.1M 0s
 23850K .......... .......... .......... .......... .......... 71% 45.1M 0s
 23900K .......... .......... .......... .......... .......... 71% 45.5M 0s
 23950K .......... .......... .......... .......... .......... 71% 49.1M 0s
 24000K .......... .......... .......... .......... .......... 71% 36.0M 0s
 24050K .......... .......... .......... .......... .......... 71% 50.0M 0s
 24100K .......... .......... .......... .......... .......... 71% 44.8M 0s
 24150K .......... .......... .......... .......... .......... 72% 46.3M 0s
 24200K .......... .......... .......... .......... .......... 72% 43.1M 0s
 24250K .......... .......... .......... .......... .......... 72% 46.6M 0s
 24300K .......... .......... .......... .......... .......... 72% 51.4M 0s
 24350K .......... .......... .......... .......... .......... 72% 42.7M 0s
 24400K .......... .......... .......... .......... .......... 72% 40.7M 0s
 24450K .......... .......... .......... .......... .......... 73% 46.0M 0s
 24500K .......... .......... .......... .......... .......... 73% 46.0M 0s
 24550K .......... .......... .......... .......... .......... 73% 44.0M 0s
 24600K .......... .......... .......... .......... .......... 73% 39.1M 0s
 24650K .......... .......... .......... .......... .......... 73% 45.6M 0s
 24700K .......... .......... .......... .......... .......... 73% 41.9M 0s
 24750K .......... .......... .......... .......... .......... 73% 42.3M 0s
 24800K .......... .......... .......... .......... .......... 74% 40.1M 0s
 24850K .......... .......... .......... .......... .......... 74% 42.8M 0s
 24900K .......... .......... .......... .......... .......... 74% 45.7M 0s
 24950K .......... .......... .......... .......... .......... 74% 47.1M 0s
 25000K .......... .......... .......... .......... .......... 74% 40.3M 0s
 25050K .......... .......... .......... .......... .......... 74% 45.9M 0s
 25100K .......... .......... .......... .......... .......... 74% 45.5M 0s
 25150K .......... .......... .......... .......... .......... 75% 46.5M 0s
 25200K .......... .......... .......... .......... .......... 75% 40.9M 0s
 25250K .......... .......... .......... .......... .......... 75% 45.4M 0s
 25300K .......... .......... .......... .......... .......... 75% 41.3M 0s
 25350K .......... .......... .......... .......... .......... 75% 43.9M 0s
 25400K .......... .......... .......... .......... .......... 75% 37.3M 0s
 25450K .......... .......... .......... .......... .......... 76% 43.8M 0s
 25500K .......... .......... .......... .......... .......... 76% 41.8M 0s
 25550K .......... .......... .......... .......... .......... 76% 43.2M 0s
 25600K .......... .......... .......... .......... .......... 76% 35.4M 0s
 25650K .......... .......... .......... .......... .......... 76% 41.5M 0s
 25700K .......... .......... .......... .......... .......... 76% 41.6M 0s
 25750K .......... .......... .......... .......... .......... 76% 45.9M 0s
 25800K .......... .......... .......... .......... .......... 77% 38.6M 0s
 25850K .......... .......... .......... .......... .......... 77% 41.5M 0s
 25900K .......... .......... .......... .......... .......... 77% 41.3M 0s
 25950K .......... .......... .......... .......... .......... 77% 42.1M 0s
 26000K .......... .......... .......... .......... .......... 77% 36.4M 0s
 26050K .......... .......... .......... .......... .......... 77% 42.3M 0s
 26100K .......... .......... .......... .......... .......... 77% 41.5M 0s
 26150K .......... .......... .......... .......... .......... 78% 41.1M 0s
 26200K .......... .......... .......... .......... .......... 78% 36.2M 0s
 26250K .......... .......... .......... .......... .......... 78% 43.8M 0s
 26300K .......... .......... .......... .......... .......... 78% 41.8M 0s
 26350K .......... .......... .......... .......... .......... 78% 40.7M 0s
 26400K .......... .......... .......... .......... .......... 78% 35.0M 0s
 26450K .......... .......... .......... .......... .......... 78% 43.1M 0s
 26500K .......... .......... .......... .......... .......... 79% 43.6M 0s
 26550K .......... .......... .......... .......... .......... 79% 41.7M 0s
 26600K .......... .......... .......... .......... .......... 79% 39.6M 0s
 26650K .......... .......... .......... .......... .......... 79% 48.3M 0s
 26700K .......... .......... .......... .......... .......... 79% 37.1M 0s
 26750K .......... .......... .......... .......... .......... 79% 41.2M 0s
 26800K .......... .......... .......... .......... .......... 80% 34.5M 0s
 26850K .......... .......... .......... .......... .......... 80% 43.4M 0s
 26900K .......... .......... .......... .......... .......... 80% 42.1M 0s
 26950K .......... .......... .......... .......... .......... 80% 41.6M 0s
 27000K .......... .......... .......... .......... .......... 80% 35.1M 0s
 27050K .......... .......... .......... .......... .......... 80% 41.0M 0s
 27100K .......... .......... .......... .......... .......... 80% 42.1M 0s
 27150K .......... .......... .......... .......... .......... 81% 42.2M 0s
 27200K .......... .......... .......... .......... .......... 81% 37.5M 0s
 27250K .......... .......... .......... .......... .......... 81% 42.4M 0s
 27300K .......... .......... .......... .......... .......... 81% 45.0M 0s
 27350K .......... .......... .......... .......... .......... 81% 44.4M 0s
 27400K .......... .......... .......... .......... .......... 81% 37.4M 0s
 27450K .......... .......... .......... .......... .......... 81% 47.7M 0s
 27500K .......... .......... .......... .......... .......... 82% 46.3M 0s
 27550K .......... .......... .......... .......... .......... 82% 46.3M 0s
 27600K .......... .......... .......... .......... .......... 82% 39.4M 0s
 27650K .......... .......... .......... .......... .......... 82% 43.6M 0s
 27700K .......... .......... .......... .......... .......... 82% 47.9M 0s
 27750K .......... .......... .......... .......... .......... 82% 43.7M 0s
 27800K .......... .......... .......... .......... .......... 83% 40.7M 0s
 27850K .......... .......... .......... .......... .......... 83% 52.1M 0s
 27900K .......... .......... .......... .......... .......... 83% 44.5M 0s
 27950K .......... .......... .......... .......... .......... 83% 45.4M 0s
 28000K .......... .......... .......... .......... .......... 83% 36.8M 0s
 28050K .......... .......... .......... .......... .......... 83% 49.7M 0s
 28100K .......... .......... .......... .......... .......... 83% 45.8M 0s
 28150K .......... .......... .......... .......... .......... 84% 46.0M 0s
 28200K .......... .......... .......... .......... .......... 84% 39.6M 0s
 28250K .......... .......... .......... .......... .......... 84% 45.6M 0s
 28300K .......... .......... .......... .......... .......... 84% 47.9M 0s
 28350K .......... .......... .......... .......... .......... 84% 43.4M 0s
 28400K .......... .......... .......... .......... .......... 84% 39.8M 0s
 28450K .......... .......... .......... .......... .......... 84% 44.4M 0s
 28500K .......... .......... .......... .......... .......... 85% 45.7M 0s
 28550K .......... .......... .......... .......... .......... 85% 47.1M 0s
 28600K .......... .......... .......... .......... .......... 85% 37.6M 0s
 28650K .......... .......... .......... .......... .......... 85% 42.7M 0s
 28700K .......... .......... .......... .......... .......... 85% 41.3M 0s
 28750K .......... .......... .......... .......... .......... 85% 43.5M 0s
 28800K .......... .......... .......... .......... .......... 86% 38.1M 0s
 28850K .......... .......... .......... .......... .......... 86% 44.4M 0s
 28900K .......... .......... .......... .......... .......... 86% 49.8M 0s
 28950K .......... .......... .......... .......... .......... 86% 43.1M 0s
 29000K .......... .......... .......... .......... .......... 86% 37.9M 0s
 29050K .......... .......... .......... .......... .......... 86% 48.0M 0s
 29100K .......... .......... .......... .......... .......... 86% 47.4M 0s
 29150K .......... .......... .......... .......... .......... 87% 49.6M 0s
 29200K .......... .......... .......... .......... .......... 87% 36.6M 0s
 29250K .......... .......... .......... .......... .......... 87% 44.7M 0s
 29300K .......... .......... .......... .......... .......... 87% 46.2M 0s
 29350K .......... .......... .......... .......... .......... 87% 45.4M 0s
 29400K .......... .......... .......... .......... .......... 87% 41.9M 0s
 29450K .......... .......... .......... .......... .......... 87% 43.0M 0s
 29500K .......... .......... .......... .......... .......... 88% 48.1M 0s
 29550K .......... .......... .......... .......... .......... 88% 43.4M 0s
 29600K .......... .......... .......... .......... .......... 88% 37.6M 0s
 29650K .......... .......... .......... .......... .......... 88% 47.2M 0s
 29700K .......... .......... .......... .......... .......... 88% 43.1M 0s
 29750K .......... .......... .......... .......... .......... 88% 44.0M 0s
 29800K .......... .......... .......... .......... .......... 88% 38.8M 0s
 29850K .......... .......... .......... .......... .......... 89% 45.0M 0s
 29900K .......... .......... .......... .......... .......... 89% 46.4M 0s
 29950K .......... .......... .......... .......... .......... 89% 43.6M 0s
 30000K .......... .......... .......... .......... .......... 89% 39.8M 0s
 30050K .......... .......... .......... .......... .......... 89% 44.4M 0s
 30100K .......... .......... .......... .......... .......... 89% 42.6M 0s
 30150K .......... .......... .......... .......... .......... 90% 45.0M 0s
 30200K .......... .......... .......... .......... .......... 90% 36.6M 0s
 30250K .......... .......... .......... .......... .......... 90% 44.4M 0s
 30300K .......... .......... .......... .......... .......... 90% 43.8M 0s
 30350K .......... .......... .......... .......... .......... 90% 47.6M 0s
 30400K .......... .......... .......... .......... .......... 90% 42.6M 0s
 30450K .......... .......... .......... .......... .......... 90% 42.9M 0s
 30500K .......... .......... .......... .......... .......... 91% 45.1M 0s
 30550K .......... .......... .......... .......... .......... 91% 46.3M 0s
 30600K .......... .......... .......... .......... .......... 91% 40.0M 0s
 30650K .......... .......... .......... .......... .......... 91% 42.0M 0s
 30700K .......... .......... .......... .......... .......... 91% 48.6M 0s
 30750K .......... .......... .......... .......... .......... 91% 44.8M 0s
 30800K .......... .......... .......... .......... .......... 91% 43.7M 0s
 30850K .......... .......... .......... .......... .......... 92% 50.2M 0s
 30900K .......... .......... .......... .......... .......... 92% 44.7M 0s
 30950K .......... .......... .......... .......... .......... 92% 44.1M 0s
 31000K .......... .......... .......... .......... .......... 92% 41.5M 0s
 31050K .......... .......... .......... .......... .......... 92% 46.2M 0s
 31100K .......... .......... .......... .......... .......... 92% 50.1M 0s
 31150K .......... .......... .......... .......... .......... 93% 44.5M 0s
 31200K .......... .......... .......... .......... .......... 93% 42.3M 0s
 31250K .......... .......... .......... .......... .......... 93% 50.2M 0s
 31300K .......... .......... .......... .......... .......... 93% 48.7M 0s
 31350K .......... .......... .......... .......... .......... 93% 48.9M 0s
 31400K .......... .......... .......... .......... .......... 93% 39.4M 0s
 31450K .......... .......... .......... .......... .......... 93% 51.0M 0s
 31500K .......... .......... .......... .......... .......... 94% 51.1M 0s
 31550K .......... .......... .......... .......... .......... 94% 45.8M 0s
 31600K .......... .......... .......... .......... .......... 94% 41.9M 0s
 31650K .......... .......... .......... .......... .......... 94% 47.9M 0s
 31700K .......... .......... .......... .......... .......... 94% 47.3M 0s
 31750K .......... .......... .......... .......... .......... 94% 43.7M 0s
 31800K .......... .......... .......... .......... .......... 94% 45.6M 0s
 31850K .......... .......... .......... .......... .......... 95% 43.8M 0s
 31900K .......... .......... .......... .......... .......... 95% 46.5M 0s
 31950K .......... .......... .......... .......... .......... 95% 45.1M 0s
 32000K .......... .......... .......... .......... .......... 95% 44.4M 0s
 32050K .......... .......... .......... .......... .......... 95% 45.5M 0s
 32100K .......... .......... .......... .......... .......... 95% 47.8M 0s
 32150K .......... .......... .......... .......... .......... 95% 49.1M 0s
 32200K .......... .......... .......... .......... .......... 96% 39.0M 0s
 32250K .......... .......... .......... .......... .......... 96% 50.8M 0s
 32300K .......... .......... .......... .......... .......... 96% 48.2M 0s
 32350K .......... .......... .......... .......... .......... 96% 44.8M 0s
 32400K .......... .......... .......... .......... .......... 96% 42.7M 0s
 32450K .......... .......... .......... .......... .......... 96% 45.1M 0s
 32500K .......... .......... .......... .......... .......... 97% 50.5M 0s
 32550K .......... .......... .......... .......... .......... 97% 48.4M 0s
 32600K .......... .......... .......... .......... .......... 97% 39.2M 0s
 32650K .......... .......... .......... .......... .......... 97% 51.3M 0s
 32700K .......... .......... .......... .......... .......... 97% 44.6M 0s
 32750K .......... .......... .......... .......... .......... 97% 39.1M 0s
 32800K .......... .......... .......... .......... .......... 97% 35.8M 0s
 32850K .......... .......... .......... .......... .......... 98% 43.0M 0s
 32900K .......... .......... .......... .......... .......... 98% 48.9M 0s
 32950K .......... .......... .......... .......... .......... 98% 41.4M 0s
 33000K .......... .......... .......... .......... .......... 98% 36.0M 0s
 33050K .......... .......... .......... .......... .......... 98% 49.4M 0s
 33100K .......... .......... .......... .......... .......... 98% 44.9M 0s
 33150K .......... .......... .......... .......... .......... 98% 50.5M 0s
 33200K .......... .......... .......... .......... .......... 99% 39.2M 0s
 33250K .......... .......... .......... .......... .......... 99% 46.2M 0s
 33300K .......... .......... .......... .......... .......... 99% 50.5M 0s
 33350K .......... .......... .......... .......... .......... 99% 44.9M 0s
 33400K .......... .......... .......... .......... .......... 99% 43.5M 0s
 33450K .......... .......... .......... .......... .......... 99% 44.2M 0s
 33500K .......... .......... .......... .......... .....     100% 41.7M=1.4s

2024-09-17 08:23:59 (23.7 MB/s) - ‘input/GSE145998_Ctrl_mouse_Snuc_comb.dge.txt.gz’ saved [34350155/34350155]
Code
sn_mouse_lung <- read.csv("input/GSE145998_Ctrl_mouse_Snuc_comb.dge.txt", sep = ",", header = TRUE) %>%
    column_to_rownames("X")
sn_mouse_lung_md <- read.table("input/GSE145998_nuc_meta.txt")
#meta data doesn't have all the cells that exp matrix has
sn_mouse_lung <- sn_mouse_lung[,rownames(sn_mouse_lung_md)]

mouse_lung_ref <-
    SummarizedExperiment(assays = list(logcounts = log(1 + sn_mouse_lung)),
                         colData = sn_mouse_lung_md)

Re-Run SingleR

Code
#make merged_samples a SummarizedExperiment
merged_sumexp <- SummarizedExperiment(assays = list(logcounts = log(1 + GetAssayData(JoinLayers(merged_samples), layer = "counts"))),
                                      colData = merged_samples@meta.data)
Warning in asMethod(object): sparse->dense coercion: allocating vector of size
2.5 GiB
Code
sn_preds <- SingleR(test = merged_sumexp,
                    ref = mouse_lung_ref,
                    labels = mouse_lung_ref$cell_type,
                    aggr.ref = TRUE)
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.

Add Predictions to Seurat Object

Now I’ll add these predictions back to my seurat object and visualize them.

I’m also going to add a column called “coarse_cell_type” that has more general annotations based on the single nuclei cell type predictions.

Code
merged_samples <- AddMetaData(object = merged_samples,
                              metadata = sn_preds$labels,
                              col.name = "sn_cell_type")

DimPlot(merged_samples, group.by = "sn_cell_type", label = TRUE)

Code
# Make coarser labels
merged_samples@meta.data <-
    mutate(merged_samples@meta.data,
           coarse_cell_type = recode(sn_cell_type,
                                     "AT1" = "Epithelial",
                                     "AT2" = "Epithelial",
                                     "Epi" = "Epithelial",
                                     "aEC" = "Endothelial",
                                     "CapEC" = "Endothelial",
                                     "EC1" = "Endothelial",
                                     "LEC" = "Endothelial",
                                     "vEC" = "Endothelial",
                                     "AM" = "Mac",
                                     "IM" = "Mac",
                                     "BC" = "B cell",
                                     "GB" = "B cell",
                                     "FB1" = "FB",
                                     "FB2" = "FB",
                                     "matFB" = "FB",
                                     "MyoFB" = "FB",
                                     "TC" = "T cell",
                                     "TC3" = "T cell",
                                     "Th17 TC" = "T cell"))

DimPlot(merged_samples, group.by = "coarse_cell_type", label = TRUE)

Cell Type Counts

Code
merged_samples@meta.data %>%
    select(sample_id, coarse_cell_type) %>%
    group_by(sample_id, coarse_cell_type) %>%
    mutate(total = n()) %>%
    unique() %>%
    pivot_wider(names_from = "sample_id", values_from = "total")
coarse_cell_type S0283 S0284 S0291
Epithelial 1678 1098 1511
Mes 91 18 41
Mac 683 415 2203
Endothelial 1432 484 848
FB 968 642 586
Peri 147 63 105
Cil 85 142 48
Club 230 303 118
T cell 106 174 224
B cell 51 41 109
Mono 95 55 382
DC 30 26 96
SMC 12 5 273
Bas 2 8 59

What the cell type abbreviations are

  • aEC: arterial endothelial cells
  • AM: alveolar macs
  • AT1: alveolar type 1 epithelial cells
  • AT2: alveolar type 2 epithelial cells
  • BC: B cells
  • CapEC: capillary endothelial cells
  • DC: dendritic cells
  • Div: dividing cells
  • EC: endothelial cells
  • Epi: epithelial cells
  • FB: fibroblasts
  • GB: germinal B cells
  • IM: interstitial macs
  • LEC: lymphatic endothelial cells
  • Mes: mesolthelial cells
  • Mono: monocytes
  • MyoFB: myofibroblasts
  • Peri: pericytes
  • SMC: smooth muscle cells
  • TC: T cells

Identify Cancer Cells

I’m interested in seeing what clusters are cancer cells. To do this I will look at SingleR annotations and expression of cancer marker genes.

Code
cancer_features <- c("Col1a1", "Col1a2", "Satb2", "Psmd4", "Mif", "Runx2")

r_feature_plot(merged_samples, features = cancer_features, coord.fixed = TRUE, ncol = 3)
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
DimPlot(merged_samples, label = TRUE)

Based on expression of these marker genes, as well as the cell type labels both with a single-cell reference and a single-nucleus reference, I think seurat cluster 2 is the cancer cells in this dataset.

Code
for (i in seq_len(nrow(merged_samples@meta.data))) {
    if (merged_samples$seurat_clusters[i] == 2) {
        merged_samples$sn_cell_type[i] <- "tumor"
        merged_samples$coarse_cell_type[i] <- "tumor"
    }
}
qs::qsave(merged_samples, "output/rdata/nucleus/merged_samples.qs")

snRNA Epithelial Analysis

We are interested in how the epithelial cell subpopulations differ between the treated and untreated samples.

Subset Epithelial Cells

First I need to subset out the epithelial cells.

Code
epi <- subset(merged_samples, coarse_cell_type == "Epithelial") %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
DimPlot(epi, group.by = "sn_cell_type", split.by = "sample_id", label = TRUE, ncol = 1, pt.size = 1) +
    NoLegend()

Find Markers For Each Cluster

Code
Idents(epi) <- epi$seurat_clusters
epi_degs <- FindAllMarkers(JoinLayers(epi))
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Calculating cluster 6
Calculating cluster 7
Calculating cluster 8
Code
write.table(epi_degs, "results/de/epi_degs.tsv")
# get top 10 degs for each cluster
epi_degs %>%
    group_by(cluster) %>%
    subset(abs(avg_log2FC) > 1 &
           p_val_adj < 0.05) %>%
    arrange(desc(abs(avg_log2FC))) %>%
    slice_head(n = 10)
p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
0e+00 -8.454526 0.004 0.231 0.0000000 0 Celf4
0e+00 -8.445058 0.002 0.239 0.0000000 0 Gm13449
0e+00 -8.400426 0.000 0.043 0.0000000 0 Adamts18
0e+00 -8.365808 0.000 0.130 0.0000000 0 Gm37229
0e+00 -8.027256 0.000 0.067 0.0000000 0 Gm28410
0e+00 -8.020713 0.000 0.076 0.0000000 0 Trpv6
0e+00 -7.942894 0.000 0.089 0.0000000 0 Vmn2r9
0e+00 -7.828613 0.000 0.049 0.0000000 0 Tnfaip8l3
0e+00 -7.819156 0.001 0.103 0.0000000 0 Abca17
0e+00 -7.691807 0.000 0.036 0.0000027 0 Spink5
0e+00 -8.003937 0.000 0.075 0.0000000 1 Pak3
0e+00 -7.898028 0.000 0.040 0.0000589 1 Adamts18
0e+00 -7.585547 0.000 0.086 0.0000000 1 Prkag3
0e+00 -7.358821 0.000 0.076 0.0000000 1 Cyp4f39
0e+00 -7.033010 0.000 0.070 0.0000000 1 Gm42836
0e+00 -6.827256 0.000 0.040 0.0000441 1 Myo16
0e+00 -6.589299 0.000 0.050 0.0000004 1 Fgf18
0e+00 -6.573138 0.001 0.189 0.0000000 1 Gm16212
0e+00 -6.495547 0.000 0.059 0.0000000 1 1700007J10Rik
0e+00 -6.478018 0.001 0.158 0.0000000 1 Gm4128
0e+00 7.261987 0.021 0.000 0.0000000 2 Gm49083
0e+00 -7.042942 0.021 0.347 0.0000000 2 Scel
0e+00 -6.982860 0.007 0.250 0.0000000 2 Slc22a22
0e+00 6.969663 0.022 0.000 0.0000000 2 1700001K23Rik
0e+00 6.584269 0.025 0.000 0.0000000 2 Ly6h
0e+00 6.522426 0.041 0.000 0.0000000 2 Gm15413
0e+00 -6.506436 0.029 0.383 0.0000000 2 Rasgrf2
0e+00 -6.424095 0.001 0.056 0.0000047 2 Syt14
0e+00 -6.319652 0.012 0.364 0.0000000 2 Cyp2b10
0e+00 6.307664 0.015 0.000 0.0000000 2 9530036M11Rik
0e+00 -9.101717 0.000 0.275 0.0000000 3 Lgals1
0e+00 -8.116386 0.000 0.171 0.0000000 3 Gm16212
0e+00 -7.778386 0.000 0.142 0.0000000 3 Gm4128
0e+00 -6.893804 0.000 0.095 0.0000000 3 Wnt7a
0e+00 -6.678174 0.002 0.160 0.0000000 3 Gm26936
0e+00 -6.656838 0.000 0.090 0.0000001 3 Ctsl
0e+00 -6.644394 0.002 0.180 0.0000000 3 Gm36569
0e+00 -6.628311 0.000 0.075 0.0000050 3 Cys1
0e+00 -6.626483 0.000 0.079 0.0000013 3 Myrf
7e-07 -6.586164 0.000 0.047 0.0149508 3 Crym
0e+00 -9.487359 0.002 0.327 0.0000000 4 Scel
0e+00 -7.587490 0.006 0.233 0.0000000 4 Slc22a22
0e+00 -6.934746 0.013 0.361 0.0000000 4 Rasgrf2
0e+00 -6.891701 0.004 0.184 0.0000000 4 Grem2
0e+00 6.721821 0.011 0.000 0.0000108 4 Gm44778
0e+00 -6.632696 0.000 0.078 0.0000061 4 Gm14964
4e-07 -6.364899 0.000 0.052 0.0079545 4 Syt14
0e+00 6.280368 0.011 0.000 0.0000108 4 Gm16034
0e+00 -6.084383 0.053 0.411 0.0000000 4 Cped1
0e+00 -6.020644 0.017 0.232 0.0000000 4 Wdr17
0e+00 -9.664834 0.008 0.378 0.0000000 5 Acoxl
0e+00 -8.633159 0.039 0.649 0.0000000 5 Ppp1r14c
0e+00 -8.245695 0.023 0.570 0.0000000 5 Arhgef38
0e+00 8.063092 0.086 0.000 0.0000000 5 Gpr37
0e+00 -7.938698 0.008 0.229 0.0000715 5 Slc26a4
0e+00 -7.927924 0.008 0.349 0.0000000 5 Aox3
1e-07 -7.852975 0.000 0.185 0.0019767 5 Galnt13
0e+00 -7.788475 0.008 0.550 0.0000000 5 Etv5
0e+00 -7.590891 0.008 0.306 0.0000000 5 Il33
0e+00 -7.467025 0.039 0.463 0.0000000 5 Tmem132d
0e+00 7.517706 0.018 0.000 0.0000000 6 Gm16140
0e+00 7.375482 0.018 0.000 0.0000000 6 Mrap
0e+00 7.168329 0.018 0.000 0.0000000 6 Gm40639
0e+00 6.473479 0.018 0.000 0.0000001 6 Gm36617
0e+00 6.325735 0.126 0.004 0.0000000 6 Lmntd1
0e+00 5.961271 0.027 0.001 0.0000000 6 Gm30003
0e+00 5.888027 0.162 0.010 0.0000000 6 Tmeff2
0e+00 5.873991 0.045 0.002 0.0000000 6 Esm1
0e+00 5.822639 0.090 0.003 0.0000000 6 Reep1
0e+00 5.820102 0.018 0.000 0.0000995 6 Gm16064
0e+00 8.519997 0.019 0.000 0.0000000 7 Clec4a4
0e+00 8.344777 0.019 0.000 0.0000000 7 Gm40304
0e+00 7.875715 0.019 0.000 0.0000000 7 Gm49101
0e+00 7.616976 0.028 0.000 0.0000000 7 Nsg2
0e+00 7.609910 0.019 0.000 0.0000000 7 Cyp2c69
0e+00 7.579176 0.028 0.000 0.0000000 7 Gm46189
0e+00 7.458157 0.019 0.000 0.0000000 7 Gm31887
0e+00 7.452691 0.019 0.000 0.0000000 7 Gm13373
0e+00 7.451283 0.019 0.000 0.0000000 7 Treml1
0e+00 7.286057 0.019 0.000 0.0000000 7 F830208F22Rik
0e+00 10.307620 0.134 0.000 0.0000000 8 Gm550
0e+00 7.804335 0.351 0.002 0.0000000 8 Ankle1
0e+00 7.624085 0.041 0.000 0.0000000 8 Gm20667
0e+00 7.576696 0.299 0.002 0.0000000 8 Pif1
4e-07 -7.483834 0.000 0.212 0.0097898 8 Slc22a22
0e+00 7.421497 0.268 0.001 0.0000000 8 Ska1
0e+00 7.378074 0.505 0.005 0.0000000 8 Cep55
0e+00 7.352716 0.031 0.000 0.0000000 8 Gm2788
0e+00 7.328367 0.402 0.004 0.0000000 8 Gm47207
0e+00 7.259813 0.485 0.004 0.0000000 8 Depdc1b

Find All Markers for Cell Types

Code
Idents(epi) <- epi$sn_cell_type

epi_subtype_degs <- FindAllMarkers(JoinLayers(epi))
Calculating cluster AT1
Calculating cluster AT2
Code
write.table(epi_subtype_degs, "results/de/epi_subptype_degs.tsv")

These markers aren’t very informative, I’m going to try comparing these datasets to single cell datasets and seeing if there’s overlap with the already annotated datasets.

Compare to Single-Cell Data

Run Singler using 2 Epithelial Datasets

I want to re-run singler using two single-cell datasets: murine_aec and mouse_lung. The hope is that mouse_lung will correctly label these cells as lung epithelial, and then the murine_aec-based predictions will elucidate general cell type composition. However I will verify the final cell types with James. I’m going to merge mouse_lung and murine_aec for this.

Code
murine_aec <- qs::qread("output/rdata/murine_aec_post-filtering.qs") %>%
    process_seurat()
Warning: Number of dimensions changing from 30 to 50
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
murine_aec@meta.data <- murine_aec@meta.data %>%
    mutate(paper_cell_type =
           recode(seurat_clusters,
                  `0` = "DATP",
                  `1` = "AEC2",
                  `2` = "DATP",
                  `3` = "pAEC2",
                  `4` = "cAEC2",
                  `5` = "AEC1",
                  `6` = "cAEC2"))

epi$rna_type <- "single nucleus"
Code
mouse_lung <-
    qs::qread("/home/gdrobertslab/lab/GenRef/sc_ref_datasets/mouse/GSE151974/mouse_lung_ref.qs")

#give same column to hold ref labels
murine_aec$ref_label <- murine_aec$paper_cell_type
mouse_lung$ref_label <- mouse_lung$CellType

merged_ref <- merge(murine_aec, mouse_lung) %>%
    process_seurat()
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
merged_preds <- SingleR::SingleR(test = as.SingleCellExperiment(JoinLayers(epi)),
                                 ref = as.SingleCellExperiment(merged_ref),
                                 labels = merged_ref$ref_label,
                                 aggr.ref = TRUE)
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Code
write.table(merged_preds, "results/cell_type_annotation/merged_preds.tsv")

epi <- AddMetaData(object = epi,
                   metadata = merged_preds$labels,
                   col.name = "merged_preds")

DimPlot(epi, group.by = "merged_preds")

Code
group_by(epi@meta.data, merged_preds) %>%
    select(merged_preds) %>%
    mutate(total = n()) %>%
    unique() %>%
    arrange(desc(total)) %>%
    interactive_dt()

There’s too many different annotations so I’m going to condense some of the different types.

Code
# I need to rename some of the cell types so they match up between the references.
epi@meta.data <- epi@meta.data %>%
    mutate(condensed_cts = recode(merged_preds,
                                  "Col14a1+ fibroblast" = "FB",
                                  "Col13a1+ fibroblast" = "FB",
                                  "Int Mf" = "FB",
                                  "Alv Mf" = "FB",
                                  "Myofibroblast" = "FB",
                                  "CD4 T cell 1" = "Immune",
                                  "DC2" = "Immune",
                                  "B cell 1" = "Immune",
                                  "gd T cell" = "Immune",
                                  "Mono" = "Immune",
                                  "SMC" = "Mural",
                                  "Pericyte 1" = "Mural",
                                  "Pericyte 2" = "Mural",
                                  "Cap" = "Blood vessel",
                                  "Vein" = "Blood vessel",
                                  "Cap-a" = "Blood vessel",
                                  "Art" = "Blood vessel"))

DimPlot(epi, group.by = "condensed_cts", label = TRUE)

Code
small_epi <- subset(epi,
       condensed_cts %in% c("AEC1",
                            "AEC2",
                            "AT1",
                            "AT2 1",
                            "AT2 2",
                            "cAEC2",
                            "Ciliated",
                            "Club",
                            "DATP",
                            "pAEC2")) %>%
    process_seurat() %>%
    JoinLayers()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
table(small_epi$condensed_cts, small_epi$seurat_clusters)
          
              0    1    2    3    4    5    6    7
  AEC1       84    3    1    0    0   13    2    0
  AEC2        0    4   93    8   10    4    0    0
  AT1      1026   20  241   22   66   96  128   11
  AT2 1       7  378  218  182  180   52    0   12
  AT2 2       1  297   59  135   86   20    0    8
  cAEC2       0    0   14    1    2    1    0   61
  Ciliated    0    2    0    2    0    0    0    0
  Club        0    4   13    2    0    2    0    0
  DATP        1   26   63   58   45    6    0    2
  pAEC2       0  124    1   89   80    4    0    1
Code
DimPlot(small_epi, group.by = c("condensed_cts", "seurat_clusters"), label = TRUE)

Based on these last two plots, I’m still not entirely sure about the best way to go about annotating some of these cell types. I’m fairly certain clusters 1 and 5 are AEC1, and that cluster 0 is alveolar type 2 (likely pAEC2?). However, clusters 2-4 and 7 are a hodge podge of type 1 and type 2 epithelial cells. I’m wondering if this resolution is too granular for the sc to sn annotations to be informative.

Find Markers in small_epi

James will use these markers to make decisions on cell types for each seurat cluster.

Code
small_epi_markers <- FindAllMarkers(small_epi)
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Calculating cluster 6
Calculating cluster 7
Code
write.table(small_epi_markers, "results/de/small_epi_markers.csv")

Label Cell Types

I’m going to remove fibroblasts and endothelial cells before reclustering and labelling. I’m going to keep the code chunk below commented out because these assignments were made before I subsetted out the tumor cells. I’m using overlap between clusters in the new object and the old one to make my new assignments.

Code
# small_epi@meta.data <- small_epi@meta.data %>%
#     mutate(james_cell_type = recode(seurat_clusters,
#                                     `0` = "AEC2",
#                                     `1` = "AEC1",
#                                     `2` = "DATP",
#                                     `3` = "primed AEC2",
#                                     `4` = "FB",
#                                     `5` = "AEC1",
#                                     `6` = "endothelial",
#                                     `7` = "pAEC2"))
# qs::qsave(small_epi, "output/rdata/small_epi.qs")

old_small_epi <- qs::qread("output/rdata/nucleus/small_epi.qs")

cluster_membership <- list()
for (i in sort(unique(small_epi$seurat_clusters))) {
    #get cell barcodes for the current cluster
    new_bcs <- subset(small_epi@meta.data, seurat_clusters == i) %>%
        rownames()
    #initialize vector to hold % of new_bcs that matchup with the old clusters
    tmp <- c()

    #loop through old object's clusters and see where new_bcs fall within them
    for (j in sort(unique(old_small_epi$seurat_clusters))) {
        old_bcs <- subset(old_small_epi@meta.data, seurat_clusters == j) %>%
            rownames()
        tmp[j] <- mean(new_bcs %in% old_bcs)
    }

    tmp <- setNames(tmp, sort(unique(old_small_epi$seurat_clusters)))
    cluster_membership[[i]] <- tmp
}
cluster_membership

Based on these overlaps, I’m going to make the following cell type assignments for my new small_epi object.

0: AEC1 1: AEC2 2: DATP 3: AEC2 4: primed AEC2 5: AEC1 6: pAEC2

Code
#Remove fibroblasts and endothelial cells
small_epi <- subset(small_epi,
                    seurat_clusters != 6 &
                    seurat_clusters != 7) %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
small_epi@meta.data <- small_epi@meta.data %>%
    mutate(james_cell_type = recode(seurat_clusters,
                                    `0` = "AEC1",
                                    `1` = "AEC2",
                                    `2` = "DATP",
                                    `3` = "pAEC2",
                                    `4` = "pAEC2",
                                    `5` = "AEC1",
                                    `6` = "cAEC2"),
           treatment = recode(sample_id,
                              "S0291" = "Control",
                              "S0283" = "Late Nintedanib",
                              "S0284" = "Early Nintedanib"))

qs::qsave(small_epi, "output/rdata/nucleus/new_small_epi.qs")

pdf("output/figures/nucleus/epithelial_split_umap.pdf", width = 12, height = 4)
r_dim_plot(small_epi,
           group.by = "james_cell_type",
           split.by = "treatment",
           label = TRUE) +
    NoLegend()
dev.off()
png 
  2 
Code
#make bar plots showing proportions of cell types
small_epi_cts <- small_epi@meta.data %>%
    group_by(james_cell_type, treatment) %>%
    mutate(total = n()) %>%
    dplyr::select(c(treatment, james_cell_type, total)) %>%
    unique()

pdf("output/figures/nucleus/epithelial_pct_plot.pdf")
ggplot(small_epi_cts, aes(fill = james_cell_type, y = total, x = treatment)) +
    geom_bar(position = "fill", stat = "identity")
dev.off()
png 
  2 

Examine Epithelial Populations Across Treatments

It is interesting to see how the epithelial subpopulation composition differs between treatment groups.

Code
DimPlot(small_epi, split.by = "treatment", group.by = "james_cell_type")

snRNA Myeloid Analysis

We are interested in how the myeloid population in the treated samples compare to that in the untreated sample. This will include looking at what subpopulations are enriched, what genes are differentially expressed, and what pathways are differentially regulated.

Subset Out Myeloid Cells

Code
if (!exists("merged_samples")) merged_samples <- qs::qread("output/rdata/nucleus/merged_samples.qs")

The first step to this analysis is to subset out the myeloid cells. I will do this according to the SingleR cell type predictions.

Code
myeloid <- subset(merged_samples, idents = c(0, 12)) %>%
    process_seurat(resolution = 1)
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
# Visualize new umap
r_dim_plot(myeloid, split.by = "treatment")

Code
r_dim_plot(myeloid, split.by = "treatment", group.by = "sn_cell_type")

Code
# Look at markers
mac_genes <- c(
    "Mmp9", "Ctsk", "Nfatc1", # Osteoclast-TAMs, https://doi.org/10.1186/s41232-022-00213-x
    "Cd9", "Trem2", "Spp1", "Gpnmb", # Scar-associated, see https://doi.org/10.1038/s41586-019-1631-3
    "Ms4a7", "Selenop", # TAMs, generic
    "Ifit1", "Ifit2", "Cxcl10", # IFN
    "Il1a", "Il1b", "Fn1",  # Inflammatory
    "Marco", "Siglecf", "Itgax", # Alveolar
    "Top2a", "Hist1h1b", "Birc5",  # Cycling
    "Ly6c2", "Sell", "Ifitm6",  # cMonocyte
    "Itgal", "Ace", "Spn", # ncMonocyte
    "Cd209a", "Ccr7", "Dcstamp", # Pre-DC
    "Xcr1", "Clec9a", "Irf8") # DC1

DotPlot(myeloid,
    features = mac_genes,
    # group.by = "macs_assignment",
    cols = "RdBu",
    col.max = 1.5) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Code
myeloid <- RenameIdents(myeloid,
    "0" = "TAM",
    "1" = "TAM",
    "2" = "Alveolar",
    "3" = "Scar",
    "4" = "TAM",
    "5" = "TAM",
    "6" = "Cycling",
    "7" = "Pre-DC",
    "8" = "ncMonocyte",
    "9" = "Pre-DC",
    "10" = "DC1",
    "11" = "Scar",
    "12" = "IFN")

r_dim_plot(myeloid, split.by = "treatment")

Code
myeloid$macs_assignment <- Idents(myeloid) %>%
    factor(levels = c(
        "Osteoclast",
        "Scar",
        "TAM",
        "IFN",
        "Inflammatory",
        "Cycling",
        "Alveolar",
        "Interstitial",
        "cMonocyte",
        "ncMonocyte",
        "Pre-DC",
        "DC1"))

qs::qsave(myeloid, "output/rdata/nucleus/myeloid_processed.qs")

Plot Markers for Each Cluster

Code
DotPlot(myeloid,
    features = mac_genes,
    group.by = "macs_assignment",
    cols = "RdBu",
    col.max = 1.5) +
    scale_y_discrete(limits = rev) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
    coord_fixed()

Final snRNA Cell Type Assignment

Looking At Whole Dataset

I’m going to see how the cell type proportions change in the whole dataset.

First I need to add james’ cell types into merged_samples

I need to add the cell types from small_epi as well.

Code
merged_samples$james_cell_type <- merged_samples$coarse_cell_type
merged_samples@meta.data[colnames(myeloid),]$james_cell_type <- myeloid$macs_assignment


#now add in annotations from small_epi
merged_samples@meta.data[colnames(small_epi),]$james_cell_type <- small_epi$james_cell_type
merged_samples@meta.data <- merged_samples@meta.data %>%
    mutate(james_cell_type = recode(james_cell_type,
                                    `1` = "AEC2",
                                    `2` = "AEC1",
                                    `3` = "DATP",
                                    `4` = "primed AEC2",
                                    `5` = "FB",
                                    `6` = "endothelial",
                                    `7` = "pAEC2"))


merged_samples$james_cell_type <- ifelse(merged_samples$seurat_clusters == 2,
                                         "tumor",
                                         merged_samples$james_cell_type)

all_cts <- merged_samples@meta.data %>%
    group_by(treatment, james_cell_type) %>%
    mutate(total = n()) %>%
    select("treatment", "james_cell_type", "total") %>%
    unique() %>%
    pivot_wider(names_from = "treatment", values_from = "total") %>%
    column_to_rownames("james_cell_type")

#convert NAs to 0s
all_cts[is.na(all_cts)] <- 0

round(prop.table(as.matrix(all_cts), margin = 2), 4)
            Late Nintedanib Early Nintedanib new control
AEC2                 0.1326           0.1505      0.0068
Mes                  0.0150           0.0052      0.0056
pAEC2                0.0378           0.0328      0.0020
DATP                 0.0184           0.0040      0.2991
AEC1                 0.1424           0.0124      0.0538
Mac                  0.0668           0.0855      0.0215
Endothelial          0.2492           0.1390      0.1140
FB                   0.1629           0.1845      0.0280
tumor                0.0371           0.0009      0.2108
Peri                 0.0253           0.0181      0.0120
Cil                  0.0152           0.0409      0.0070
Club                 0.0408           0.0869      0.0171
T cell               0.0182           0.0498      0.0312
B cell               0.0078           0.0109      0.0139
Epithelial           0.0062           0.0081      0.0342
10                   0.0037           0.0026      0.0064
11                   0.0070           0.0121      0.0215
SMC                  0.0012           0.0014      0.0011
primed AEC2          0.0075           0.1451      0.0725
endothelial          0.0018           0.0040      0.0317
12                   0.0025           0.0023      0.0044
Mono                 0.0002           0.0003      0.0014
Bas                  0.0004           0.0020      0.0036
DC                   0.0000           0.0006      0.0005
Code
niche_cells <- subset(merged_samples, james_cell_type != "tumor") %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
niche_cells$treatment <- recode(niche_cells$sample_id,
                            "S0291" = "Control",
                            "S0283" = "Late Nintedanib",
                            "S0284" = "Early Nintedanib")

Cell types that are in niche_cells now

“AEC1” = “Epithelial”, “Mes” = “Mesothelial”, “TAMs” = “Monocyte/Macrophage”, “DATP” = “Epithelial”, “AEC2” = “Epithelial”, “Mac” = “Monocyte/Macrophage”, “Endothelial” = “Endothelial”, “FB” = “Fibroblast”, “endothelial” = “Endothelial”, “Scar-TAMs” = “Monocyte/Macrophage”, “Peri” = “Smooth muscle cells”, “Cil” = “Epithelial”, “Club” = “Epithelial”, “Interstitial” = “Monocyte/Macrophage”, “T cell” = “T/NK cells”, “B cell” = “B cell”, “Epithelial” = “Epithelial”, “Mono” = “Monocyte/Macrophage”, “DC” = “Dendritic cell”, “SMC” = “Smooth muscle cells”, “pAEC2” = “Epithelial”, “primed AEC2” = “Epithelial”, “Bas” = “Granulocyte”, “Angio-TAMs” = “Monocyte/Macrophage”

Add in These Cell Types

Code
niche_cells$final_cell_type <-
    recode(niche_cells$james_cell_type,
           "AEC1" = "Epithelial",
           "Mes" = "Mesothelial",
           "TAMs" = "Monocyte/Macrophage",
           "DATP" = "Epithelial",
           "AEC2" = "Epithelial",
           "Mac" = "Monocyte/Macrophage",
           "Endothelial" = "Endothelial",
           "FB" = "Fibroblast",
           "endothelial" = "Endothelial",
           "Scar-TAMs" = "Monocyte/Macrophage",
           "Peri" = "Smooth muscle cells",
           "Cil" = "Epithelial",
           "Club" = "Epithelial",
           "Interstitial" = "Monocyte/Macrophage",
           "T cell" = "T/NK cells",
           "B cell" = "B cell",
           "Epithelial" = "Epithelial",
           "Mono" = "Monocyte/Macrophage",
           "DC" = "Dendritic cell",
           "SMC" = "Smooth muscle cells",
           "cAEC2" = "Epithelial",
           "pAEC2" = "Epithelial",
           "Bas" = "Granulocyte",
           "Angio-TAMs" = "Monocyte/Macrophage")

r_dim_plot(niche_cells, group.by = "final_cell_type")

Determine Proximal vs Distal Airway Cells

Code
epi_feats <- c("Scgb1a1", "Epcam", "Tuba1a", "Sftpd", "Ager", "Aqp5")

final_epi <- subset(niche_cells, final_cell_type == "Epithelial") %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
r_feature_plot(final_epi, features = epi_feats, coord.fixed = TRUE)
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
r_dim_plot(final_epi)

Based on these pltos, specifically the expression of Scgb1a1 and Ager and Aqp5, I am led to believe that clusters 3 and 5 are proximal airway cells, and clusters 0, 1, 2, 4, 6, and 7 are distal airway cells.

Code
final_epi$final_cell_type <- recode(final_epi$seurat_clusters,
                                    `0` = "Distal airway cell",
                                    `1` = "Distal airway cell",
                                    `2` = "Distal airway cell",
                                    `3` = "Proximal airway cell",
                                    `4` = "Distal airway cell",
                                    `5` = "Proximal airway cell",
                                    `6` = "Distal airway cell",
                                    `7` = "Distal airway cell")

#now I need to add these back to my niche_cells object
niche_cells@meta.data[colnames(final_epi), ]$final_cell_type <-
    final_epi$final_cell_type

niche_cells$final_cell_type <- recode(niche_cells$final_cell_type,
                                      `1` = "Distal airway cell",
                                      `2` = "Proximal airway cell")

#UMAP of niche cells
r_dim_plot(niche_cells, group.by = "final_cell_type", split.by = "treatment")

Determine Alveolar Macs

Code
mac_feats <- c("Itgam", "Fcgr1", "Cx3cr1", "Cd68", "Itgax", "Mertk")
niche_macs <- subset(niche_cells, final_cell_type == "Monocyte/Macrophage") %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
#feature plot to differentiate alveolar macs from other macs
r_feature_plot(niche_macs, features = mac_feats, coord.fixed = TRUE, ncol = 3)
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.

Code
r_dim_plot(niche_macs)

Based on this feature plot, I’m confident that clusters 2, 3, 5, and 7 are our alveolar macrophages.

Code
niche_macs$final_cell_type <-
    ifelse(niche_macs$seurat_clusters %in% c(2, 3, 5, 7),
           "Alveolar macrophage",
           "Monocyte/Macrophage")

Add Annotations Back to niche_cells

Code
niche_cells@meta.data[colnames(niche_macs), ]$final_cell_type <-
    niche_macs$final_cell_type

pdf("output/figures/nucleus/niche_cell_split_umap.pdf", height = 6, width = 15)
r_dim_plot(niche_cells, group.by = "final_cell_type", split.by = "treatment")
dev.off()
png 
  2 

Do nichenetr analysis to identify nintedanib targets

Read in required data for normal/tumor nichenetr analysis

Code
murine_aec <-
    qs::qread("output/rdata/murine_aec_post-filtering.qs") %>%
    RenameIdents(
        `0` = "DATP",
        `1` = "AEC2",
        `2` = "DATP",
        `3` = "pAEC2",
        `4` = "cAEC2",
        `5` = "AEC1",
        `6` = "cAEC2"
    ) %>%
    subset(sample_name == "S0066" | sample_name == "S0068")
murine_aec$cell_type_fine <- Idents(murine_aec)

b6_f420_combined_all <-
    qs::qread("output_old/rdata/b6_f420_combined_all.qs") %>%
    AddMetaData(qs::qread("misc/b6_f420_assignments.qs"))

DefaultAssay(b6_f420_combined_all) <- "RNA"
Idents(b6_f420_combined_all) <- "cell_type_final"

f420_cells <-
    qs::qread("output/rdata/sobj_list.qs")[["F420"]] %>%
    AddMetaData(
        qs::qread("misc/b6_f420_assignments.qs") %>%
            rownames_to_column("cells") %>%
            filter(grepl("F420", cells)) %>%
            mutate(cells = str_remove(cells, ".+_")) %>%
            column_to_rownames("cells")
    ) %>%
    subset(cell_type_final == "F420" & seurat_clusters != "15")

mouse_data <-
    subset(
        b6_f420_combined_all,
        idents = c(
            "Monocyte/Macrophage",
            "Alveolar macrophage",
            "T/NK cell",
            "B cell",
            "Dendritic cell"
        )
    ) %>%
    merge(y = list(murine_aec, f420_cells)) %>%
    process_seurat() %>%
    JoinLayers()
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
Idents(mouse_data) <- "cell_type_final"
mouse_data[["SCT"]] <- NULL

r_dim_plot(
    mouse_data,
    group.by = "cell_type_final",
    label = TRUE,
    split.by = "obj_name"
)

Prep nichenetr reference data

Code
rrrSingleCellUtils:::load_lig_receptor_data()
lr_network <-
    rrrSingleCellUtils:::rrr_env$lr_network_strict %>%
    mutate(from = nichenetr::convert_human_to_mouse_symbols(from),
           to = nichenetr::convert_human_to_mouse_symbols(to)) %>%
    drop_na()

ligand_target_matrix <-
    rrrSingleCellUtils:::rrr_env$ligand_target_matrix

colnames(ligand_target_matrix) <-
    ligand_target_matrix %>%
    colnames() %>%
    convert_human_to_mouse_symbols()

rownames(ligand_target_matrix) <-
    ligand_target_matrix %>%
    rownames() %>%
    convert_human_to_mouse_symbols()

ligand_target_matrix <-
    ligand_target_matrix %>%
    .[!is.na(rownames(ligand_target_matrix)),
        !is.na(colnames(ligand_target_matrix))]

weighted_networks_lr <-
    rrrSingleCellUtils:::rrr_env$weighted_networks_lr %>%
    mutate(from = convert_human_to_mouse_symbols(from),
           to = convert_human_to_mouse_symbols(to)) %>%
    drop_na()

Calculate DE between primary and met F420 to get DE list to use for nichenetr

Code
tibia_f420 <-
    qs::qread("output/rdata/sobj_list.qs")$tibia_F420

ref_path <- "/home/gdrobertslab/lab/GenRef/sc_ref_datasets/mouse"
mouse_lung_ref <- qs::qread(paste0(ref_path, "/GSE151974/mouse_lung_ref.qs"))
tumor_ref <- qs::qread("output/rdata/f420_reference.qs")
mouse_immune <- celldex::ImmGenData()
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
Code
# This label is not informative
mouse_immune <-
    mouse_immune[mouse_immune$label.main != "Stromal cells", ]
mouse_rna <- celldex::MouseRNAseqData()
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
Code
cell_assign <-
    SingleR::SingleR(
        as.SingleCellExperiment(tibia_f420),
        ref = list(
            GetAssayData(mouse_lung_ref),
            mouse_immune,
            mouse_rna,
            GetAssayData(tumor_ref)
        ),
        labels = list(
            mouse_lung_ref$cell_type,
            mouse_immune$label.main,
            mouse_rna$label.fine,
            tumor_ref$cell_type
        ),
        aggr.ref = TRUE
    )
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning: did not converge in 10 iterations
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning: did not converge in 10 iterations
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth =
TRUE, : You're computing too large a percentage of total singular values, use a
standard svd instead.
Code
tibia_f420$cell_type_final <- cell_assign$labels
tibia_f420$cell_score <- cell_assign$scores %>%
    apply(MARGIN = 1, function(x) max(x, na.rm = TRUE))

DimPlot(
    tibia_f420,
    group.by = c("seurat_clusters", "cell_type_final"),
    label = TRUE,
    repel = TRUE
) +
    NoLegend()
Warning: ggrepel: 17 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Code
tumor_clusters <-
    table(
        tibia_f420$cell_type_final,
        tibia_f420$seurat_clusters
    ) %>%
    prop.table(margin = 2) %>%
    as.data.frame() %>%
    filter(Var1 == "Tumor" & Freq > 0.9) %>%
    pull(Var2)

tibia_f420 <-
    subset(tibia_f420, seurat_clusters %in% tumor_clusters)

# adding in b6 here so it's easy to add cell IDs to match assignments
b6_f420_combined_t <-
    merge(
        f420_cells,
        tibia_f420,
        add.cell.ids = c("", "tibia")
    ) %>%
    JoinLayers() %>%
    subset(cell_type_final == "F420" | cell_type_final == "Tumor") %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
Idents(b6_f420_combined_t) <- "cell_type_final"

celltype_de <- list()
celltype_de[["F420"]] <-
    FindMarkers(
        b6_f420_combined_t,
        ident.1 = "F420 primary tumor in tibia",
        group.by = "sample_description"
    ) %>%
    filter(p_val_adj < 0.05)

plot_name <-
    DimPlot(
        b6_f420_combined_t,
        group.by = "cell_type_final",
        label = TRUE,
        repel = TRUE
    )
ggsave("output/figures/f420_prim_met_dim.png", plot_name, width = 15, height = 10)

plot_name <-
    DimPlot(
        b6_f420_combined_t,
        group.by = "cell_type",
        label = TRUE,
        repel = TRUE
    ) +
    NoLegend()
ggsave("stuff_f420_cell_type_no_legend.png", plot_name, width = 15, height = 10)

plot_name <-
    DimPlot(
        b6_f420_combined_t,
        split.by = "sample_name",
        cells.highlight = WhichCells(b6_f420_combined_t, idents = "F420")
    )

ggsave("stuff_f420_cells_split.png", plot_name, width = 15, height = 10)

DimPlot(
    b6_f420_combined_t,
    group.by = "cell_type",
    label = TRUE,
    repel = TRUE
) +
    NoLegend()
Warning: ggrepel: 13 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Code
all_cell_types <- unique(mouse_data$cell_type_final)

set.seed(1337)

pdf("output/figures/nichenetr/nichenet_junk_plots.pdf")
nichenet_out <-
    parallel::mclapply(
        all_cell_types,
        mc.cores = parallelly::availableCores(),
        mc.preschedule = FALSE,
        function(receiver_cell) {
            sender_cells <-
                grep(
                    receiver_cell,
                    unique(mouse_data$cell_type_final),
                    value = TRUE,
                    invert = TRUE
                )

            subset_data <-
                subset(
                    mouse_data,
                    cell_type_final == receiver_cell
                )

            if (receiver_cell != "F420") {
                celltype_de[[receiver_cell]] <-
                    subset(mouse_data, cell_type_final == receiver_cell) %>%
                    FindMarkers(group.by = "sample",
                                ident.1 = "C57BL6") %>%
                    filter(p_val_adj < 0.05)
            }

            celltype_de[[receiver_cell]] %>%
            rownames_to_column("gene") %>%
            write_tsv(paste0("output/tables/nichenetr/",
                             str_replace(receiver_cell, "/", "_"),
                             "_de.tsv"))

            tumor_upregulated <-
                celltype_de[[receiver_cell]] %>%
                filter(avg_log2FC < 0) %>%
                rownames_to_column("gene") %>%
                pull(gene)

            if (length(tumor_upregulated) < 10) {
                return("Not enough DE genes")
            }

            nichenet_out <-
                find_ligands(
                    mouse_data,
                    stringency = "strict",
                    tumor_upregulated,
                    receiver = receiver_cell,
                    senders = sender_cells,
                    gset_spec = "mouse",
                    rec_spec = "mouse",
                    send_spec = "mouse"
                )

            return(nichenet_out)
        })
dev.off()
png 
  2 
Code
names(nichenet_out) <- all_cell_types

qs::qsave(nichenet_out, "output/rdata/nichenetr_out.qs")

Make complex heatmaps from nichenetr output

Code
for (receiver_cell in all_cell_types) {
    if (is.list(nichenet_out[[receiver_cell]])) {
        plot_name <-
            plot_complex_heatmap(nichenet_out[[receiver_cell]]) +
            labs(
                title = paste(
                    "Nichenetr results where",
                    receiver_cell,
                    "is the receiver cell"
                )
            )

        ggsave(
            paste0("output/figures/nichenetr/nichenet_heatmap_",
                    str_replace(receiver_cell, "/", "_"),
                    ".pdf"),
            plot_name,
            width = 15,
            height = 6
        )

        qs::qsave(
            plot_name,
            paste0(
                "output/figures/nichenetr/nichenet_heatmap_",
                str_replace(receiver_cell, "/", "_"),
                ".qs"
            )
        )
    } else {
        print(paste(receiver_cell, "-", nichenet_out[[receiver_cell]]))
    }
}
Warning: Removed 1181 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 741 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 1141 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 561 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 1121 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 1121 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 501 rows containing missing values or values outside the scale range
(`geom_point()`).

Make the same complexheatmap plots, but only for nint targets

Code
nint_receptors <-
    c(
        "Fgfr1", "Fgfr2", "Fgfr3", "Fgfr4",
        "Pdgfra", "Pdgfrb",
        "Src",
        "Flt1", # VEGFR1
        "Kdr", # VEGFR2
        "Flt4", # VEGFR3
        "Flt3",
        "Lck",
        "Lyn"
    )

rrrSingleCellUtils:::load_lig_receptor_data()
nint_ligands <-
    rrrSingleCellUtils:::rrr_env$lr_network_strict %>%
    mutate(from = nichenetr::convert_human_to_mouse_symbols(from),
           to = nichenetr::convert_human_to_mouse_symbols(to)) %>%
    filter(to %in% nint_receptors) %>%
    select(-source, -database) %>%
    dplyr::rename(ligand = "from", receptor = "to") %>%
    pull(ligand) %>%
    unique()

slot_1_data <- data.frame(id = character())
slot_2_data <- data.frame(id = character())
slot_4_data <- data.frame(y = ordered())

score_cutoff <- 0.2

for (receiver_cell in all_cell_types) {
    if (is.list(nichenet_out[[receiver_cell]])) {
        sub_nichenetr <- nichenet_out[[receiver_cell]]

        slot_4_data <-
            rbind(
                slot_4_data,
                filter(sub_nichenetr[[4]]$data, x %in% nint_receptors &
                       y %in% nint_ligands &
                       score > score_cutoff) %>%
                    mutate(x = paste(receiver_cell, x))
            )

        sub_nichenetr[[1]]$data <-
            filter(sub_nichenetr[[1]]$data, features.plot %in% nint_ligands)

        slot_1_data <-
            rbind(slot_1_data, sub_nichenetr[[1]]$data)

        sub_nichenetr[[2]]$data <-
            filter(
                sub_nichenetr[[2]]$data,
                features.plot %in% nint_receptors
            ) %>%
            mutate(id = "Receptor Expression")

        if (nrow(sub_nichenetr[[2]]$data) > 0) {
            rownames(sub_nichenetr[[2]]$data) <-
                paste(receiver_cell, rownames(sub_nichenetr[[2]]$data))

            slot_2_data <-
                rbind(
                    slot_2_data,
                    mutate(
                        sub_nichenetr[[2]]$data,
                        features.plot = paste(receiver_cell, features.plot)
                    )
                )
        }

    } else {
        print(paste(receiver_cell, "-", nichenet_out[[receiver_cell]]))
    }
}

# Pf4 had really low scores (~0.02) and was filtered out of slot_4_data, but not
# from slot_1_data, so we're doing it here
slot_1_data <-
    filter(slot_1_data,
           features.plot %in% slot_4_data$y)
holder_list <-
    list(
        list(data = slot_1_data),
        list(data = slot_2_data),
        NULL,
        list(data = slot_4_data)
    )

plot_name <-
    plot_complex_heatmap(holder_list) +
    theme(legend.direction = "vertical", legend.box = "horizontal")

ggsave(
    "output/figures/nichenetr/nichenet_heatmap_main.pdf",
    plot_name,
    width = 7,
    height = 5
)
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_point()`).
Code
ligand_receptors <-
    rrrSingleCellUtils:::rrr_env$lr_network_strict %>%
    mutate(from = nichenetr::convert_human_to_mouse_symbols(from),
           to = nichenetr::convert_human_to_mouse_symbols(to)) %>%
    filter(to %in% nint_receptors) %>%
    select(-source, -database) %>%
    dplyr::rename(ligand = "from", receptor = "to")

nint_ligands <- unique(ligand_receptors$ligand)

int_score_min <- 0.2

Idents(mouse_data) <- "cell_type_final"

expressed_receptors <- list()
for (receiver_cell in all_cell_types) {
    if (is.list(nichenet_out[[receiver_cell]])) {
        nn_receptors <-
            nichenet_out[[receiver_cell]][[7]] %>%
            rownames()
        nn_nint_receptors <- nn_receptors[nn_receptors %in% nint_receptors]

        # Get ligands upstream of nn_nint_receptors
        nn_nint_ligands_receptor <-
            nichenet_out[[receiver_cell]][[7]] %>%
            as.data.frame() %>%
            rownames_to_column("receptor") %>%
            pivot_longer(
                -receptor,
                names_to = "ligand",
                values_to = "weight"
            ) %>%
            filter(receptor %in% nn_nint_receptors & weight > int_score_min) %>%
            mutate(receiver_cell = receiver_cell)

        expressed_receptors[[receiver_cell]] <- nn_nint_ligands_receptor
    }
}

nn_receptor_ligands <-
    bind_rows(expressed_receptors)

sub_l_t_mat <-
    ligand_target_matrix[, unique(c(nn_receptor_ligands$ligand, "Pdgfa", "Pdgfb"))] %>%
    as.data.frame() %>%
    rownames_to_column("gene") %>%
    pivot_longer(
        -gene,
        names_to = "ligand",
        values_to = "score"
    )

score_cutoff <- 0.002

sub_l_t_mat %>%
    ggplot(aes(x = score)) +
    geom_histogram(bins = 200) +
    facet_wrap(~ ligand, ncol = 1) +
    scale_x_log10() +
    geom_vline(xintercept = score_cutoff)
Warning in scale_x_log10(): log-10 transformation introduced infinite values.
Warning: Removed 775 rows containing non-finite outside the scale range
(`stat_bin()`).

Code
sub_l_t_mat <-
    sub_l_t_mat %>%
    filter(score > score_cutoff)

gene_lists <-
    sub_l_t_mat %>%
    select(-score) %>%
    pivot_wider(names_from = ligand, values_from = gene) %>%
    as.list()
Warning: Values from `gene` are not uniquely identified; output will contain list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
  {data} %>%
  dplyr::group_by(ligand) %>%
  dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
  dplyr::filter(n > 1L)
Code
gene_lists <-
    list(
        Fgf1_downstream = gene_lists$Fgf1[[1]],
        Vegfa_downstream = gene_lists$Vegfa[[1]],
        Pdgfa_downstream = gene_lists$Pdgfa[[1]],
        Pdgfb_downstream = gene_lists$Pdgfb[[1]]
    )

qs::qsave(gene_lists, "output/rdata/nichenetr_gene_lists.qs")

mouse_data <-
    AddModuleScore(
        mouse_data,
        features = gene_lists,
        name = paste0(names(gene_lists), "_")
    )
Warning: The following features are not present in the object: Fgf3, Prap1, not
searching for symbol synonyms
Warning: The following features are not present in the object: COX1, COX2,
Cyp2c29, Epo, Mmp1a, Nkx2-5, Prap1, Serpina1a, not searching for symbol
synonyms
Code
FeaturePlot(
    mouse_data,
    features = c(
        "Fgf1_downstream_1",
        "Vegfa_downstream_3",
        "Pdgfa_downstream_4",
        "Pdgfb_downstream_5"
    )
)
Warning: The following requested variables were not found: Vegfa_downstream_3,
Pdgfa_downstream_4, Pdgfb_downstream_5

Gather nuclear data

Code
james_tams <- qs::qread("output/rdata/nucleus/myeloid_processed.qs")
james_tams$coarse_cell_type <- "TAM"

james_epi <- qs::qread("output/rdata/nucleus/new_small_epi.qs")

# In the code that created the myeloid_processed.qs file, the macrophages were
# subset to exclude everything we're bringing back in here
non_james_cells <-
    qs::qread("output/rdata/nucleus/merged_samples.qs") %>%
    subset(
        cells = setdiff(Cells(.), Cells(james_tams)) %>%
            setdiff(Cells(james_epi))
    ) %>%
    subset(coarse_cell_type != "Epithelial")

non_james_cells$treatment <-
    recode(
        non_james_cells$sample_id,
        "S0291" = "Control",
        "S0283" = "Late Nintedanib",
        "S0284" = "Early Nintedanib"
    )

non_james_cells$coarse_cell_type[non_james_cells$sn_cell_type == "AM"] <- "AM"
non_james_cells$coarse_cell_type[non_james_cells$sn_cell_type == "IM"] <- "IM"

set.seed(1337)
gene_lists$random_genes <- sample(rownames(james_epi), 30)
gene_lists$random_genes_2 <- sample(rownames(james_epi), 100)

epi_myeloid_data <-
    merge(
        james_tams,
        list(james_epi, non_james_cells)
    ) %>%
    JoinLayers() %>%
    process_seurat() %>%
    AddModuleScore(
        features = gene_lists,
        name = paste0(names(gene_lists), "_")
    )
Warning: Some cell names are duplicated across objects provided. Renaming to
enforce unique cell names.
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Warning: The following features are not present in the object: Calb1, Fgf3,
Mmp7, Mtnr1a, Ucp1, not searching for symbol synonyms
Warning: The following features are not present in the object: Alppl2, COX1,
COX2, Ctgf, Cyp2c29, Dusp15, Igf2, Mmp1a, Mmp20, Nkx2-5, Serpina1a, not
searching for symbol synonyms
Warning: The following features are not present in the object: Arc, not
searching for symbol synonyms
Code
plot_name <-
    DimPlot(
        epi_myeloid_data,
        group.by = "coarse_cell_type",
        split.by = "treatment",
        label = TRUE
    ) +
    NoLegend() +
    coord_fixed()

ggsave(
    "output/figures/nichenetr/epi_myeloid_treatment_dimplot.pdf",
    plot_name,
    width = 15,
    height = 10
)

qs::qsave(
    plot_name,
    "output/figures/nichenetr/epi_myeloid_treatment_dimplot.qs"
)

FeaturePlot(
    epi_myeloid_data,
    features = c(
        "Fgf1_downstream_1",
        "Vegfa_downstream_2",
        "Pdgfa_downstream_3",
        "Pdgfb_downstream_4"
    )
)

Code
VlnPlot(
    epi_myeloid_data,
    features = c(
        "Fgf1_downstream_1",
        "Vegfa_downstream_2",
        "Pdgfa_downstream_3",
        "Pdgfb_downstream_4"),
    split.by = "treatment",
    group.by = "james_cell_type",
    ncol = 1
) +
    theme(legend.position = "top")
The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.

Code
aucell_out <-
    AUCell::AUCell_run(GetAssayData(epi_myeloid_data), gene_lists) %>%
    AUCell::getAUC() %>%
    as.data.frame() %>%
    t()

epi_myeloid_data <- AddMetaData(epi_myeloid_data, aucell_out)

plot_name <-
    VlnPlot(
        epi_myeloid_data,
        features = c(
            "Fgf1_downstream",
            "Vegfa_downstream",
            "Pdgfa_downstream",
            "Pdgfb_downstream",
            "random_genes"),
        split.by = "treatment",
        group.by = "coarse_cell_type",
        ncol = 1
    ) +
        theme(legend.position = "top")

ggsave(
    "output/figures/nichenetr/epi_myeloid_treatment_aucell_vln.pdf",
    plot_name,
    width = 10,
    height = 15
)
Warning: Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Warning: Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Code
qs::qsave(
    plot_name,
    "output/figures/nichenetr/epi_myeloid_treatment_aucell_vln.qs"
)

epi_myeloid_data$vln_group <-
    epi_myeloid_data$coarse_cell_type %>%
    str_replace("TAM", "Infiltrating Myeloid") %>%
    str_replace("Mono", "Infiltrating Myeloid") %>%
    str_replace("tumor", "Tumor")

plot_name <-
    epi_myeloid_data %>%
    subset(vln_group %in% c(
        "Tumor",
        "Infiltrating Myeloid",
        "Epithelial")) %>%
    VlnPlot(
        features = c(
            "Fgf1_downstream",
            "Vegfa_downstream",
            "Pdgfa_downstream",
            "Pdgfb_downstream"),
        split.by = "treatment",
        group.by = "vln_group",
        ncol = 1
    ) +
        theme(legend.position = "top")

ggsave(
    "output/figures/nichenetr/epi_myeloid_treatment_aucell_vln_main.pdf",
    plot_name,
    width = 10,
    height = 15
)
qs::qsave(
    plot_name,
    "output/figures/nichenetr/epi_myeloid_treatment_aucell_vln_main.qs"
)

Try doing statistical analysis on the AUCell data

Code
output <- data.frame(cell_type = character())
min_cells <- 20

for (downstream_targets in names(gene_lists)) {
    message(downstream_targets)
    for (second_group in c("Early Nintedanib", "Late Nintedanib")) {
        message(second_group)
        output <-
            parallel::mclapply(
                unique(epi_myeloid_data$coarse_cell_type),
                mc.cores = parallelly::availableCores(),
                mc.preschedule = FALSE,
                function(cell_type) {
                message(cell_type)

                stuff <-
                    data.frame(
                        cell_type = cell_type,
                        downstream_targets = downstream_targets,
                        group_1 = "Control",
                        group_2 = second_group
                    )

                fewest_cells <-
                    min(
                        sum(
                            epi_myeloid_data$coarse_cell_type == cell_type &
                                epi_myeloid_data$treatment == "Control"
                        ),
                        sum(
                            epi_myeloid_data$coarse_cell_type == cell_type &
                                epi_myeloid_data$treatment == second_group
                        )
                    )

                if (fewest_cells > min_cells) {
                    sub_data <-
                        subset(
                            epi_myeloid_data,
                            coarse_cell_type == cell_type &
                                treatment %in% c("Control", second_group),
                            seed = sample(1:1000000)
                        )

                    stuff$p_val <-
                        t.test(
                            get(downstream_targets) ~ treatment,
                            data = sub_data@meta.data
                        )$p.value

                    stuff$eff_size <-
                        effsize::cohen.d(
                            get(downstream_targets) ~ treatment,
                            data = sub_data@meta.data
                        )$estimate
                }

                return(stuff)
            }) %>%
            bind_rows(output)

        sub_data <-
            subset(
                epi_myeloid_data,
                vln_group == "Infiltrating Myeloid" &
                    treatment %in% c("Control", second_group),
                seed = sample(1:1000000)
            )

        stuff <-
            data.frame(
                cell_type = "Infiltrating Myeloid",
                downstream_targets = downstream_targets,
                group_1 = "Control",
                group_2 = second_group
            )

        stuff$p_val <-
            t.test(
                get(downstream_targets) ~ treatment,
                data = sub_data@meta.data
            )$p.value

        stuff$eff_size <-
            effsize::cohen.d(
                get(downstream_targets) ~ treatment,
                data = sub_data@meta.data
            )$estimate

        output <- bind_rows(output, stuff)

    }
}
Fgf1_downstream
Early Nintedanib
Late Nintedanib
Vegfa_downstream
Early Nintedanib
Late Nintedanib
Pdgfa_downstream
Early Nintedanib
Late Nintedanib
Pdgfb_downstream
Early Nintedanib
Late Nintedanib
random_genes
Early Nintedanib
Late Nintedanib
random_genes_2
Early Nintedanib
Late Nintedanib
Code
output$fdr_value <- p.adjust(output$p_val, method = "BH")

# FDR correction

write_tsv(output, "output/tables/nichenetr/nichenetr_effect_pvals.tsv")

Load in single Visium lung met sample

Read in raw data

ROB491D-1 SP0001 F420 lung met

Code
spatial_data <-
    Load10X_Spatial(
        "/home/gdrobertslab/lab/Counts_2/SP0001/outs/",
        filename = "filtered_feature_bc_matrix.h5",
        image = Read10X_Image(
            image.dir = "/home/gdrobertslab/lab/Counts_2/SP0001/outs/spatial",
            image.name = "tissue_hires_image.png"
        )
    ) %>%
    subset(nCount_Spatial > 0)
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Code
# Seurat hard codes that you're using lowres for the scale factors
# https://github.com/satijalab/seurat/issues/5614#issuecomment-1194555472
spatial_data@images$slice1@scale.factors$lowres <-
    spatial_data@images$slice1@scale.factors$hires

spatial_data <- AddMetaData(spatial_data, GetTissueCoordinates(spatial_data))

GetTissueCoordinates(spatial_data) %>%
    ggplot(aes(x = x, y = y)) +
    geom_point()

Code
# Filter out spots outside of the tissue
subset(spatial_data, x > 0 & y > 0) %>%
    SpatialPlot(features = "nCount_Spatial", pt.size.factor = 1) +
    theme_bw()
Warning: Not validating Centroids objects
Warning: Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects

Code
spatial_data <- subset(spatial_data, x > 0 & y > 0)
Warning: Not validating Centroids objects
Warning: Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Code
spatial_data$sample_id <- "SP0001"

Process the data

Code
spatial_data <-
    spatial_data %>%
    SCTransform(assay = "Spatial") %>%
    RunPCA(assay = "SCT") %>%
    FindNeighbors(dims = 1:30) %>%
    FindClusters(resolution = 1.5)  %>%
    RunUMAP(dims = 1:30)
Running SCTransform on assay: Spatial
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 19216 by 1637
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 1637 cells
Found 105 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 19216 genes
Computing corrected count matrix for 19216 genes
Calculating gene attributes
Wall clock passed: Time difference of 12.46883 secs
Determine variable features
Centering data matrix
Set default assay to SCT
PC_ 1 
Positive:  Col1a1, Serpinf1, Thbs2, Mif, Fbln2, Pabpc1, Il1rl1, Ccl7, Col5a2, Lgals1 
       Col1a2, Cdkn2a, Cthrc1, Loxl2, C1qtnf6, Col3a1, Pgk1, Col6a1, Timp1, Tnc 
       Stra6, Sdc2, Col5a1, Postn, Col6a2, Enpp2, Fgfr1, Col5a3, Tnn, Medag 
Negative:  Inmt, Epas1, Ager, Hba-a2, Cxcl15, Sftpa1, Chil1, Slc34a2, Lrg1, Sec14l3 
       Mfap4, Tspan7, Slc9a3r2, Hbb-bs, Calcrl, Cd36, Scd1, Hpgd, Clic5, Tmem100 
       Lyz1, Scn7a, Sftpd, Gpx3, Car4, Cyp4b1, Il33, Cldn5, Fmo2, Sema3c 
PC_ 2 
Positive:  C1qtnf3, Igfbp3, Inmt, Postn, Col12a1, Itm2a, Igfbp5, Fndc1, Cthrc1, Epas1 
       P4ha2, Col5a1, C1qtnf6, Ndufa4l2, Col11a1, Aspn, Hpgd, Mxra8, Sparc, Rcn3 
       Tmem100, Loxl2, Fbn2, P4ha1, Cpxm1, Olfml2b, Itga11, Lrrc17, Col5a2, Cldn5 
Negative:  Spp1, Ctss, Apoe, Ctsd, Gpnmb, C3ar1, C1qb, Cd68, S100a4, Lgals3 
       Trem2, C1qc, C1qa, Ctsb, Arg1, Hmox1, Lpl, Ccl9, Lgmn, Psap 
       Mpeg1, Alox5ap, Tyrobp, Fcer1g, Hexb, Cd84, Itgam, Ccr2, Ncf1, Adam8 
PC_ 3 
Positive:  Il1rl1, Inmt, Ccl2, Stra6, Hist1h2bk, Ccl7, Epas1, Hist1h2bj, Neto2, Hpgd 
       Hspd1, C1qbp, Nhp2, Odaph, Dcn, Cct5, Gstt1, Hist1h3d, Psat1, Hist2h2ac 
       Tspan7, Acpp, Ppa1, Snu13, Stip1, Hist1h3c, Hist1h1b, Cldn5, Hsph1, Hist1h1a 
Negative:  Itm2a, Col12a1, Fabp5, C1qtnf3, Cpxm1, Acta2, Fndc1, Col3a1, Igfbp5, Col11a1 
       Itga11, Aspn, Ndufa4l2, Ctsb, Hmox1, Rcn3, Mmp14, Ctsk, Ctsd, Igfbp3 
       Mmp2, Gpnmb, Mmp13, Apoe, P4ha2, Slc16a3, Mt1, Lpl, C1qc, Ccn3 
PC_ 4 
Positive:  Retnla, Acta2, Mgp, Chil1, Sftpa1, Eln, Lyz1, Chia1, Cxcl15, C3 
       Sftpd, C4b, Serpina3n, Lcn2, Clu, Il33, Actg2, Slc34a2, Sparcl1, Tpm2 
       Tagln, Areg, Lrg1, Hc, Gclc, Lamp3, Dpt, Marco, Myl9, Scd1 
Negative:  Cxcl10, Ifit1, Irf7, Ifit2, Oasl2, Rsad2, Ifit3, Usp18, Rnf213, Igtp 
       Ifi44, Bst2, Oasl1, Stat1, Gbp2, Oas3, Irgm2, Ifi211, Cmpk2, Rtp4 
       Dhx58, Gbp3, Xaf1, Ifi47, Irgm1, Slfn5, Parp14, Zbp1, Slfn1, Phf11d 
PC_ 5 
Positive:  Apoe, Gpnmb, Ctss, Trem2, Ctsd, Inmt, Ctsb, Hpgd, C1qc, Cd68 
       C1qb, Psap, C3ar1, Itm2a, Thbd, Epas1, Pltp, Sema3c, Cd84, C1qa 
       Tmem100, Adgre5, Ndufa4l2, Tyrobp, Mfap4, Car4, Hexa, Cldn5, Slc11a1, Thy1 
Negative:  Ifit1, Cxcl10, Ifit3, Oasl2, Chil1, Rsad2, Usp18, Mgp, Sftpa1, Ifit2 
       Rnf213, Ifi44, Gbp2, Irf7, C3, Igtp, Cxcl15, Lyz1, Zbp1, Retnla 
       Il33, Rtp4, Irgm2, Ifi47, Oasl1, Iigp1, Cmpk2, Sftpd, Gbp3, Xaf1 
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1637
Number of edges: 61589

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.6554
Number of communities: 12
Elapsed time: 0 seconds
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
14:29:52 UMAP embedding parameters a = 0.9922 b = 1.112
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
14:29:52 Read 1637 rows and found 30 numeric columns
14:29:52 Using Annoy for neighbor search, n_neighbors = 30
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
14:29:52 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
14:29:52 Writing NN index file to temp file /gpfs0/scratch/5855649/RtmpJig9Nt/file2c96174efc6544
14:29:52 Searching Annoy index using 1 thread, search_k = 3000
14:29:53 Annoy recall = 100%
14:29:54 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
14:29:57 Initializing from normalized Laplacian + noise (using RSpectra)
14:29:57 Commencing optimization for 500 epochs, with 67116 positive edges
14:30:00 Optimization finished

Add in module for tumor marker genes

Code
tumor_features <- c("Runx2", "Fn1", "Mki67", "Col1a1")
spatial_data <-
    AddModuleScore(spatial_data, list(tumor_features), name = "tumor_markers")

Make some plots of the data

Code
plot_name <-
    SpatialDimPlot(
        spatial_data,
        pt.size = 2,
        label = TRUE
    )
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
ggsave(
    "output/figures/spatial/spatial_dimplot_plot.pdf",
    plot_name,
    width = 15,
    height = 15
)
qs::qsave(plot_name, file = "output/figures/spatial/spatial_dimplot_plot.qs")

plot_name <-
    DimPlot(
        spatial_data,
        label = TRUE,
        repel = TRUE
    )
ggsave(
    "output/figures/spatial/spatial_umap_plot_by_cluster.pdf",
    plot_name,
    width = 15,
    height = 10
)
qs::qsave(
    plot_name,
    file = "output/figures/spatial/spatial_umap_plot_by_cluster.qs"
)

plot_name <-
    SpatialPlot(
        spatial_data,
        features = tumor_features,
        pt.size.factor = 2
    )
ggsave(
    "output/figures/spatial/spatial_plot_by_feature.pdf",
    plot_name,
    width = 15,
    height = 15
)
qs::qsave(plot_name, file = "output/figures/spatial/spatial_plot_by_feature.qs")

vln_plot <-
    VlnPlot(
        spatial_data,
        features = "tumor_markers1",
        split.by = "seurat_clusters",
        cols = sample(rainbow(50))
    )
The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.
Code
ggsave(
    "output/figures/spatial/vln_tumor_markers_plot.pdf",
    vln_plot,
    width = 10,
    height = 5
)
qs::qsave(vln_plot, file = "output/figures/spatial/vln_tumor_markers_plot.qs")

Split out tumor spots

Code
col_cutoff <- 1

tumor_clusters <-
    spatial_data@meta.data %>%
    select(tumor_markers1, seurat_clusters) %>%
    rownames_to_column("spot") %>%
    group_by(seurat_clusters) %>%
    dplyr::summarize(tumor_median = median(tumor_markers1)) %>%
    arrange(desc(tumor_median)) %>% # Put them in order so we can use this later
    filter(tumor_median > col_cutoff) %>%
    pull(seurat_clusters)

spatial_data$tumor <- spatial_data$seurat_clusters %in% tumor_clusters

tumor_only <- subset(spatial_data, subset = tumor == TRUE)
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Code
plot_1 <-
    SpatialPlot(
        spatial_data,
        group.by = "tumor",
        pt.size.factor = 1.5,
        label = TRUE
    )
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
plot_2 <-
    SpatialPlot(
        tumor_only,
        group.by = "seurat_clusters",
        pt.size.factor = 1.5,
        label = TRUE
    )
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
plot_3 <-
    VlnPlot(
        spatial_data,
        features = "tumor_markers1",
        group.by = "seurat_clusters"
    )

merged_plot <- patchwork::wrap_plots(plot_1, plot_2, plot_3, ncol = 2)

ggsave(
    "output/figures/spatial/tumor_clusters_vln.pdf",
    merged_plot,
    width = 10,
    height = 10
)
qs::qsave(merged_plot, file = "output/figures/spatial/tumor_clusters_vln.qs")

Save off the data

Code
qs::qsave(spatial_data, file = "output/rdata/spatial_data.qs")

Generate a reference of F420 tumor cells from lung metastasis

Code
data_path <- "/home/gdrobertslab/lab/Counts_2/"
end_path <- "/filtered_feature_bc_matrix"

obj_name <- "S0261"

sobj <-
    tenx_load_qc(
        paste0(
            data_path,
            obj_name,
            end_path
        ),
        min_cells = 3,
        min_features = 200,
        violin_plot = FALSE
    )

# Add sample name to dataset
sobj$sample <- obj_name

sobj <- sobj %>%
    subset(nCount_RNA <= 100000 & percent.mt <= 40) %>%
    process_seurat()
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
# Add in barcode info
lt_barcodes <-
    gen_cellecta_bc_data(
        file = paste0(
            data_path,
            obj_name,
            "/possorted_genome_bam.bam"
        ),
        verbose = TRUE,
        samtools_module = "SAMtools"
    )

sobj <- process_ltbc(sobj, cid_lt = lt_barcodes)

sobj$has_bc <- !is.na(sobj$lt)
tumor_cell_cluster <-
    table(sobj$has_bc, sobj$seurat_clusters) %>%
    as.data.frame() %>%
    filter(Var1 == TRUE) %>%
    arrange(desc(Freq)) %>%
    pull(Var2) %>%
    head(n = 1)

sobj <- subset(sobj, seurat_clusters == tumor_cell_cluster)

sobj$cell_type <- "Tumor"

qs::qsave(sobj, "output/rdata/f420_reference.qs")

Run cell deconvolution on spatial data with spacexr

Load reference data to create single cell matrix

Code
#macs <- qs::qread("output/rdata/final_murine_macs.qs")
#macs <- macs[, !is.na(macs$macs_assignment)]

macs <- qs::qread("output/rdata/macs_subclustered_cleaned.qs")

macs$cell_type <- macs$macs_assignment

# Epithelial cells
murine_aec <- qs::qread("output/rdata/murine_aec_post-filtering.qs")
murine_aec <-
    RenameIdents(
        murine_aec,
        `0` = "DATP",
        `1` = "AEC2",
        `2` = "DATP",
        `3` = "pAEC2",
        `4` = "cAEC2",
        `5` = "AEC1",
        `6` = "cAEC2"
    )

murine_aec$cell_type <- Idents(murine_aec)

ref_path <- "/home/gdrobertslab/lab/GenRef/sc_ref_datasets/mouse"
mouse_lung_ref <- qs::qread(paste0(ref_path, "/GSE151974/mouse_lung_ref.qs"))

# Subset out the epithelial and macrophage cells since we have those above
redundant_types <-
    c(
        "AT1",
        "AT2 1",
        "AT2 2",
        "DC1",
        "DC2",
        "Int Mf",
        "Mono"
    )
keep_cell_types <- setdiff(unique(mouse_lung_ref$cell_type), redundant_types)

mouse_lung_ref <-
    subset(mouse_lung_ref, cell_type %in% keep_cell_types)

f420 <- qs::qread("output/rdata/f420_reference.qs")

Merge and process reference data

Code
all_ref <-
    merge(mouse_lung_ref, list(macs, murine_aec, f420)) %>%
    JoinLayers() %>%
    process_seurat()
Warning: Different features in new layer data than already exists for
scale.data
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by 'BiocGenerics'
Code
all_ref <-
    DietSeurat(
        all_ref,
        data = FALSE,
        dimreducs = c("pca", "umap")
    )
Warning: The `data` argument of `DietSeurat()` is deprecated as of Seurat 5.0.0.
ℹ Please use the `layers` argument instead.
Code
all_ref@meta.data <-
    all_ref@meta.data[, c(
        "cell_type",
        "seurat_clusters",
        "nCount_RNA",
        "orig.ident",
        "nFeature_RNA",
        "sample_name",
        "model",
        "sample_description"
        )]

qs::qsave(all_ref, "output/rdata/spatial_reference.qs")

plot_name <-
    DimPlot(
        all_ref,
        group.by = "cell_type",
        label = TRUE,
        repel = TRUE
    ) +
    NoLegend()

ggsave("output/figures/spatial/spacexr_ref_data.pdf",
       plot_name,
       width = 10,
       height = 10)
qs::qsave(plot_name, "output/figures/spatial/spacexr_ref_data.qs")

Create the Reference object

Code
spacexr_reference <-
    spacexr::Reference(
        GetAssayData(all_ref, layer = "counts"),
        as.factor(all_ref$cell_type)
    )
Code
spatial_data <- qs::qread("output/rdata/spatial_data.qs")

coords <-
    GetTissueCoordinates(spatial_data) %>%
    select(-cell)

my_data <-
    spacexr::SpatialRNA(coords,
                        GetAssayData(spatial_data, layer = "counts"))

Run the RCTD analysis with spacexr

Code
rctd_obj <-
    spacexr::create.RCTD(my_data, spacexr_reference, max_cores = 2)
Begin: process_cell_type_info
process_cell_type_info: number of cells in reference: 48529
process_cell_type_info: number of genes in reference: 25479

               AEC1                AEC2              Alv Mf            Alveolar 
                 64                 338                2261                 374 
                Art            B cell 1            B cell 2               cAEC2 
                365                1541                  63                 104 
                Cap               Cap-a        CD4 T cell 1        CD4 T cell 2 
               4342                1787                1137                 199 
       CD8 T cell 1        CD8 T cell 2            Ciliated                Club 
                444                  33                 453                 214 
          cMonocyte Col13a1+ fibroblast Col14a1+ fibroblast             Cycling 
               1442                5536                1277                1161 
               DATP                 DC1           gd T cell            IFN-TAMs 
                594                 901                  65                1682 
               ILC2   Inflammatory-TAMs        Interstitial               Lymph 
                109                1103                 268                 185 
           Mast Ba2         Mesothelial       Myofibroblast          ncMonocyte 
                108                 263                 224                 527 
             Neut 1              Neut 2             NK cell     Osteoclast-TAMs 
               1078                 116                 176                 244 
              pAEC2          Pericyte 1          Pericyte 2              Pre-DC 
                143                 732                 693                2075 
          Scar-TAMs                 SMC                TAMs               Tumor 
               2795                3280                6312                1365 
               Vein 
                356 
Warning in asMethod(object): sparse->dense coercion: allocating vector of size
1.1 GiB
Warning in asMethod(object): sparse->dense coercion: allocating vector of size
1.2 GiB
End: process_cell_type_info
create.RCTD: getting regression differentially expressed genes: 
get_de_genes: AEC1 found DE genes: 252
get_de_genes: AEC2 found DE genes: 97
get_de_genes: Alv Mf found DE genes: 185
get_de_genes: Alveolar found DE genes: 160
get_de_genes: Art found DE genes: 252
get_de_genes: B cell 1 found DE genes: 162
get_de_genes: B cell 2 found DE genes: 275
get_de_genes: cAEC2 found DE genes: 111
get_de_genes: Cap found DE genes: 278
get_de_genes: Cap-a found DE genes: 312
get_de_genes: CD4 T cell 1 found DE genes: 158
get_de_genes: CD4 T cell 2 found DE genes: 136
get_de_genes: CD8 T cell 1 found DE genes: 120
get_de_genes: CD8 T cell 2 found DE genes: 424
get_de_genes: Ciliated found DE genes: 207
get_de_genes: Club found DE genes: 53
get_de_genes: cMonocyte found DE genes: 172
get_de_genes: Col13a1+ fibroblast found DE genes: 216
get_de_genes: Col14a1+ fibroblast found DE genes: 209
get_de_genes: Cycling found DE genes: 102
get_de_genes: DATP found DE genes: 109
get_de_genes: DC1 found DE genes: 121
get_de_genes: gd T cell found DE genes: 141
get_de_genes: IFN-TAMs found DE genes: 246
get_de_genes: ILC2 found DE genes: 128
get_de_genes: Inflammatory-TAMs found DE genes: 211
get_de_genes: Interstitial found DE genes: 181
get_de_genes: Lymph found DE genes: 193
get_de_genes: Mast Ba2 found DE genes: 188
get_de_genes: Mesothelial found DE genes: 220
get_de_genes: Myofibroblast found DE genes: 210
get_de_genes: ncMonocyte found DE genes: 271
get_de_genes: Neut 1 found DE genes: 299
get_de_genes: Neut 2 found DE genes: 127
get_de_genes: NK cell found DE genes: 112
get_de_genes: Osteoclast-TAMs found DE genes: 144
get_de_genes: pAEC2 found DE genes: 102
get_de_genes: Pericyte 1 found DE genes: 199
get_de_genes: Pericyte 2 found DE genes: 264
get_de_genes: Pre-DC found DE genes: 114
get_de_genes: Scar-TAMs found DE genes: 162
get_de_genes: SMC found DE genes: 208
get_de_genes: TAMs found DE genes: 125
get_de_genes: Tumor found DE genes: 268
get_de_genes: Vein found DE genes: 243
get_de_genes: total DE genes: 3012
create.RCTD: getting platform effect normalization differentially expressed genes: 
get_de_genes: AEC1 found DE genes: 475
get_de_genes: AEC2 found DE genes: 195
get_de_genes: Alv Mf found DE genes: 396
get_de_genes: Alveolar found DE genes: 373
get_de_genes: Art found DE genes: 523
get_de_genes: B cell 1 found DE genes: 411
get_de_genes: B cell 2 found DE genes: 646
get_de_genes: cAEC2 found DE genes: 250
get_de_genes: Cap found DE genes: 620
get_de_genes: Cap-a found DE genes: 624
get_de_genes: CD4 T cell 1 found DE genes: 394
get_de_genes: CD4 T cell 2 found DE genes: 362
get_de_genes: CD8 T cell 1 found DE genes: 308
get_de_genes: CD8 T cell 2 found DE genes: 937
get_de_genes: Ciliated found DE genes: 437
get_de_genes: Club found DE genes: 89
get_de_genes: cMonocyte found DE genes: 340
get_de_genes: Col13a1+ fibroblast found DE genes: 449
get_de_genes: Col14a1+ fibroblast found DE genes: 420
get_de_genes: Cycling found DE genes: 261
get_de_genes: DATP found DE genes: 231
get_de_genes: DC1 found DE genes: 242
get_de_genes: gd T cell found DE genes: 361
get_de_genes: IFN-TAMs found DE genes: 428
get_de_genes: ILC2 found DE genes: 274
get_de_genes: Inflammatory-TAMs found DE genes: 368
get_de_genes: Interstitial found DE genes: 357
get_de_genes: Lymph found DE genes: 476
get_de_genes: Mast Ba2 found DE genes: 342
get_de_genes: Mesothelial found DE genes: 417
get_de_genes: Myofibroblast found DE genes: 430
get_de_genes: ncMonocyte found DE genes: 485
get_de_genes: Neut 1 found DE genes: 521
get_de_genes: Neut 2 found DE genes: 234
get_de_genes: NK cell found DE genes: 289
get_de_genes: Osteoclast-TAMs found DE genes: 299
get_de_genes: pAEC2 found DE genes: 210
get_de_genes: Pericyte 1 found DE genes: 411
get_de_genes: Pericyte 2 found DE genes: 569
get_de_genes: Pre-DC found DE genes: 260
get_de_genes: Scar-TAMs found DE genes: 307
get_de_genes: SMC found DE genes: 498
get_de_genes: TAMs found DE genes: 269
get_de_genes: Tumor found DE genes: 596
get_de_genes: Vein found DE genes: 516
get_de_genes: total DE genes: 4850
Code
rctd_out <-
    spacexr::run.RCTD(rctd_obj, doublet_mode = "full")
fitBulk: decomposing bulk
chooseSigma: using initial Q_mat with sigma =  1
Likelihood value: 4624165.22940624
Sigma value:  0.84
Likelihood value: 4515363.24497431
Sigma value:  0.69
Likelihood value: 4424370.89985437
Sigma value:  0.61
Likelihood value: 4383012.66847486
Sigma value:  0.53
Likelihood value: 4348693.63879692
Sigma value:  0.45
Likelihood value: 4323319.24879029
Sigma value:  0.37
Likelihood value: 4308958.98481489
Sigma value:  0.33
Likelihood value: 4306542.58724448
Sigma value:  0.33
Code
qs::qsave(rctd_out, "output/rdata/rctd_out.qs")
Code
norm_weights <-
    spacexr::normalize_weights(rctd_out@results$weights)

spatial_data <-
    AddMetaData(spatial_data, norm_weights)

qs::qsave(spatial_data, "output/rdata/spatial_data_deconvoluted.qs")

for (cell_type in all_ref$cell_type %>% unique()) {
    subset_data <-
        spatial_data[, !is.na(spatial_data[[cell_type]])]

    plot_name <-
        SpatialFeaturePlot(
            subset_data,
            features = c(cell_type, "Tumor"),
            ncol = 2,
            pt.size.factor = 2.5
        )

    ggsave(paste0("output/figures/spatial/spacexr_",
                  cell_type,
                  ".pdf"),
           plot_name,
           width = 10,
           height = 10)
}

Plot

Code
tumor_clusters <-
    tumor_only$seurat_clusters %>%
        as.character() %>%
        unique()

for (cluster_number in tumor_clusters) {
    tumor_c_celltypes <-
        subset(spatial_data, seurat_clusters == cluster_number)@meta.data %>%
        select(colnames(norm_weights)) %>%
        as_tibble() %>%
        pivot_longer(everything()) %>%
        group_by(name) %>%
        mutate(median_value = median(value, na.rm = TRUE)) %>%
        ungroup() %>%
        mutate(name = factor(name) %>%
               fct_reorder(median_value),
               factor_number = as.numeric(name)) %>%
        filter(factor_number >= max(factor_number) - 10) %>%
        ggplot(aes(y = name, x = value)) +
        geom_boxplot() +
        geom_point() +
        labs(x = "Normalized Weight",
             y = "Cell Type",
             title = paste0("Tumor Cell Types in Cluster ", cluster_number))

    ggsave(paste0("output/figures/spatial/spacexr_tumor_cluster",
                  cluster_number,
                  "_celltypes.png"),
        tumor_c_celltypes,
        width = 8,
        height = 10)
    qs::qsave(tumor_c_celltypes,
               paste0("output/figures/spatial/spacexr_tumor_cluster",
                      cluster_number,
                      "_celltypes.qs"))
}
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects

Look at spatial correlation of cell types

Look at which celltype occur together frequently

Code
spatial_data <- qs::qread("output/rdata/spatial_data_deconvoluted.qs")
Code
cell_data <-
    spatial_data@meta.data %>%
    select(
        -orig.ident,
        -nCount_Spatial,
        -nFeature_Spatial,
        -nCount_SCT,
        -nFeature_SCT,
        -x,
        -y,
        -sample_id,
        -seurat_clusters,
        -SCT_snn_res.1.5,
        -tumor_markers1,
        -cell,
        -tumor
    )

cell_cor <- cor(cell_data, method = "spearman")
Code
pheatmap::pheatmap(
    cell_cor,
    filename = "output/figures/spatial/cell_cor_treated.png",
    width = 10,
    height = 10
)

plot_name <-
    cell_cor %>%
    as.data.frame() %>%
    rownames_to_column("celltype1") %>%
    mutate(celltype1 = as.factor(celltype1) %>%
               fct_reorder(Tumor)) %>%
    filter(celltype1 != "Tumor") %>%
    ggplot(aes(x = "",
               y = celltype1)) +
    geom_tile(aes(fill = Tumor)) +
    scale_fill_gradient2(
        low = "blue",
        high = "red",
        mid = "white",
        midpoint = 0,
        name = "Correlation with\ntumor cells"
    ) +
    labs(x = NULL, y = NULL)
ggsave(
    "output/figures/spatial/cell_cor_plot.png",
    plot_name,
    width = 6,
    height = 10
)
qs::qsave(plot_name, file = "output/figures/spatial/cell_cor_plot.qs")

Overlay nintedanib targets onto spatial data

Code
spatial_data <- qs::qread("output/rdata/spatial_data_deconvoluted.qs")

gene_lists <- qs::qread("output/rdata/nichenetr_gene_lists.qs")
Code
test <-
    AUCell::AUCell_run(GetAssayData(spatial_data), gene_lists) %>%
    AUCell::getAUC() %>%
    as.data.frame() %>%
    t()

spatial_data <- AddMetaData(spatial_data, test)

plot_name <-
    SpatialFeaturePlot(
        spatial_data,
        features = c(
            "Fgf1_downstream",
            "Vegfa_downstream",
            "Pdgfa_downstream",
            "Pdgfb_downstream"
            ),
        pt.size.factor = 2
    )
ggsave(
    "output/figures/spatial/nint_paths_featurePlot.pdf",
    plot_name,
    width = 10,
    height = 10
    )
qs::qsave(plot_name, "output/figures/spatial/nint_paths_featurePlot.qs")

plot_name <-
    VlnPlot(
        spatial_data,
        features = c(
            "Fgf1_downstream",
            "Vegfa_downstream",
            "Pdgfa_downstream",
            "Pdgfb_downstream"
            ),
        group.by = "seurat_clusters",
        ncol = 1
    )
ggsave(
    "output/figures/spatial/nint_paths_vlnPlot.pdf",
    plot_name,
    width = 6,
    height = 10
    )
Code
fgf1_cutoff <- 0.13

spatial_data@meta.data %>%
    ggplot(aes(x = Fgf1_downstream)) +
    geom_histogram(bins = 100) +
    geom_vline(xintercept = fgf1_cutoff, color = "red")

Code
spatial_data$high_fgf1 <- spatial_data$Fgf1_downstream > fgf1_cutoff

SpatialDimPlot(
    spatial_data,
    group.by = "high_fgf1"
)

Code
rctd_out <- qs::qread("output/rdata/rctd_out.qs")

norm_weights <-
    spacexr::normalize_weights(rctd_out@results$weights)


# This approach didn't seem to work well - pAEC2 came out as third
subset(spatial_data, Fgf1_downstream > fgf1_cutoff)@meta.data %>%
    select(colnames(norm_weights)) %>%
    as_tibble() %>%
    pivot_longer(everything()) %>%
    group_by(name) %>%
    mutate(median_value = median(value, na.rm = TRUE)) %>%
    ungroup() %>%
    mutate(name = factor(name) %>%
            fct_reorder(median_value),
            factor_number = as.numeric(name)) %>%
    filter(factor_number >= max(factor_number) - 10) %>%
    ggplot(aes(y = name, x = value)) +
    geom_boxplot() +
    geom_point()
Warning: Not validating Centroids objects
Not validating Centroids objects
Warning: Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Not validating FOV objects
Warning: Not validating Seurat objects

Code
plot_list <-
lapply(
    colnames(norm_weights),
    function(gene) {
    spatial_data@meta.data %>%
        ggplot(aes(x = get(gene), y = Fgf1_downstream)) +
        geom_point() +
        labs(x = gene)
    })
wrap_plots(plot_list)

Code
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.9 (Ootpa)

Matrix products: default
BLAS:   /usr/lib64/libblis.so.2.1.0 
LAPACK: /gpfs0/export/apps/opt/R/4.3.0-foss-2020a/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: US/Eastern
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] celldex_1.12.0              SingleR_2.4.1              
 [3] SummarizedExperiment_1.32.0 Biobase_2.62.0             
 [5] GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
 [7] IRanges_2.36.0              S4Vectors_0.40.2           
 [9] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[11] matrixStats_1.2.0           qreport_1.0-1              
[13] data.table_1.14.10          Hmisc_5.1-1                
[15] future_1.33.0               msigdbr_7.5.1              
[17] rrrSingleCellUtils_0.13.0   nichenetr_2.1.5            
[19] sctransform_0.4.1           Seurat_5.1.0               
[21] SeuratObject_5.0.2          sp_2.1-2                   
[23] reticulate_1.34.0           patchwork_1.2.0            
[25] ggrepel_0.9.4               lubridate_1.9.3            
[27] forcats_1.0.0               stringr_1.5.1              
[29] dplyr_1.1.4                 purrr_1.0.2                
[31] readr_2.1.4                 tidyr_1.3.0                
[33] tibble_3.2.1                ggplot2_3.5.1              
[35] tidyverse_2.0.0             cluster_2.1.4              
[37] cowplot_1.1.2              

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             GSEABase_1.64.0              
  [3] vroom_1.6.5                   nnet_7.3-18                  
  [5] goftest_1.2-3                 Biostrings_2.70.1            
  [7] TH.data_1.1-2                 vctrs_0.6.5                  
  [9] spatstat.random_3.3-1         RApiSerialize_0.1.2          
 [11] digest_0.6.33                 png_0.1-8                    
 [13] shape_1.4.6                   proxy_0.4-27                 
 [15] deldir_2.0-2                  parallelly_1.36.0            
 [17] MASS_7.3-58.4                 reshape2_1.4.4               
 [19] httpuv_1.6.13                 foreach_1.5.2                
 [21] withr_2.5.2                   ggrastr_1.0.2                
 [23] xfun_0.41                     ggpubr_0.6.0                 
 [25] ellipsis_0.3.2                survival_3.5-5               
 [27] memoise_2.0.1                 ggbeeswarm_0.7.2             
 [29] MatrixModels_0.5-3            systemfonts_1.0.5            
 [31] ragg_1.2.7                    zoo_1.8-12                   
 [33] GlobalOptions_0.1.2           pbapply_1.7-2                
 [35] R.oo_1.26.0                   Formula_1.2-5                
 [37] KEGGREST_1.42.0               promises_1.2.1               
 [39] effsize_0.8.1                 httr_1.4.7                   
 [41] rstatix_0.7.2                 globals_0.16.2               
 [43] fitdistrplus_1.1-11           stringfish_0.16.0            
 [45] rstudioapi_0.15.0             miniUI_0.1.1.1               
 [47] generics_0.1.3                base64enc_0.1-3              
 [49] curl_5.2.0                    babelgene_22.9               
 [51] zlibbioc_1.48.0               ScaledMatrix_1.10.0          
 [53] polyclip_1.10-6               randomForest_4.7-1.1         
 [55] quadprog_1.5-8                ExperimentHub_2.10.0         
 [57] GenomeInfoDbData_1.2.11       SparseArray_1.2.2            
 [59] interactiveDisplayBase_1.40.0 xtable_1.8-4                 
 [61] doParallel_1.0.17             rms_6.8-1                    
 [63] evaluate_0.23                 S4Arrays_1.2.0               
 [65] BiocFileCache_2.10.1          hms_1.1.3                    
 [67] irlba_2.3.5.1                 filelock_1.0.3               
 [69] qs_0.25.7                     colorspace_2.1-0             
 [71] visNetwork_2.1.2              hdf5r_1.3.8                  
 [73] ROCR_1.0-11                   spatstat.data_3.1-2          
 [75] magrittr_2.0.3                lmtest_0.9-40                
 [77] glmGamPoi_1.14.3              later_1.3.2                  
 [79] viridis_0.6.4                 lattice_0.21-8               
 [81] spatstat.geom_3.3-2           future.apply_1.11.1          
 [83] SparseM_1.81                  XML_3.99-0.16                
 [85] scattermore_1.2               shadowtext_0.1.2             
 [87] RcppAnnoy_0.0.21              class_7.3-21                 
 [89] pillar_1.9.0                  nlme_3.1-162                 
 [91] iterators_1.0.14              caTools_1.18.2               
 [93] compiler_4.3.0                beachmat_2.18.1              
 [95] RSpectra_0.16-1               stringi_1.8.3                
 [97] gower_1.0.1                   tensor_1.5                   
 [99] plyr_1.8.9                    crayon_1.5.2                 
[101] abind_1.4-5                   bit_4.0.5                    
[103] sandwich_3.1-0                textshaping_0.3.7            
[105] codetools_0.2-19              multcomp_1.4-25              
[107] recipes_1.0.9                 BiocSingular_1.18.0          
[109] e1071_1.7-14                  GetoptLong_1.0.5             
[111] plotly_4.10.3                 mime_0.12                    
[113] splines_4.3.0                 circlize_0.4.15              
[115] Rcpp_1.0.11                   fastDummies_1.7.3            
[117] quantreg_5.97                 dbplyr_2.4.0                 
[119] sparseMatrixStats_1.14.0      Rttf2pt1_1.3.12              
[121] knitr_1.45                    blob_1.2.4                   
[123] utf8_1.2.4                    clue_0.3-65                  
[125] BiocVersion_3.18.1            listenv_0.9.0                
[127] checkmate_2.3.1               DelayedMatrixStats_1.24.0    
[129] ggsignif_0.6.4                Matrix_1.6-4                 
[131] statmod_1.5.0                 tzdb_0.4.0                   
[133] pheatmap_1.0.12               tweenr_2.0.2                 
[135] pkgconfig_2.0.3               tools_4.3.0                  
[137] cachem_1.0.8                  RSQLite_2.3.4                
[139] viridisLite_0.4.2             DBI_1.2.0                    
[141] fastmap_1.1.1                 rmarkdown_2.25               
[143] scales_1.3.0                  grid_4.3.0                   
[145] ica_1.0-3                     broom_1.0.5                  
[147] AnnotationHub_3.10.0          BiocManager_1.30.22          
[149] dotCall64_1.1-1               graph_1.80.0                 
[151] carData_3.0-5                 RANN_2.6.1                   
[153] rpart_4.1.19                  farver_2.1.1                 
[155] yaml_2.3.8                    DiagrammeR_1.0.10            
[157] foreign_0.8-84                cli_3.6.2                    
[159] leiden_0.4.3.1                lifecycle_1.0.4              
[161] caret_6.0-94                  uwot_0.1.16                  
[163] mvtnorm_1.2-4                 lava_1.7.3                   
[165] backports_1.4.1               annotate_1.80.0              
[167] BiocParallel_1.36.0           timechange_0.2.0             
[169] gtable_0.3.4                  rjson_0.2.21                 
[171] ggridges_0.5.5                progressr_0.14.0             
[173] parallel_4.3.0                pROC_1.18.5                  
[175] limma_3.58.1                  jsonlite_1.8.8               
[177] RcppHNSW_0.5.0                bitops_1.0-7                 
[179] bit64_4.0.5                   Rtsne_0.17                   
[181] spatstat.utils_3.0-5          RcppParallel_5.1.7           
[183] polspline_1.1.25              R.utils_2.12.3               
[185] spatstat.univar_3.0-0         timeDate_4032.109            
[187] lazyeval_0.2.2                shiny_1.8.0                  
[189] htmltools_0.5.7               rappdirs_0.3.3               
[191] glue_1.6.2                    spam_2.10-0                  
[193] XVector_0.42.0                RCurl_1.98-1.13              
[195] gridExtra_2.3                 AUCell_1.24.0                
[197] igraph_2.0.3                  extrafontdb_1.0              
[199] R6_2.5.1                      fdrtool_1.2.17               
[201] labeling_0.4.3                ipred_0.9-14                 
[203] vipor_0.4.7                   DelayedArray_0.28.0          
[205] tidyselect_1.2.0              htmlTable_2.4.2              
[207] ggforce_0.4.1                 car_3.1-2                    
[209] AnnotationDbi_1.64.1          ModelMetrics_1.2.2.2         
[211] rsvd_1.0.5                    munsell_0.5.0                
[213] KernSmooth_2.23-20            htmlwidgets_1.6.4            
[215] ComplexHeatmap_2.18.0         RColorBrewer_1.1-3           
[217] rlang_1.1.2                   spatstat.sparse_3.1-0        
[219] extrafont_0.19                spatstat.explore_3.3-1       
[221] ggnewscale_0.4.9              fansi_1.0.6                  
[223] hardhat_1.3.0                 spacexr_2.2.1                
[225] beeswarm_0.4.0                prodlim_2023.08.28